Adding new column to pandas Result DataFrame in NaN

I have a pandas DataFrame data

with the following transaction data:

           A         date
0      M000833  2016-08-01
1      M000833  2016-08-01
2      M000833  2016-08-02
3      M000833  2016-08-02 
4      M000511  2016-08-05

      

I need a new column with the number of visits (number of visits per day should be treated as 1) for each consumer.

So I tried this:

import pandas as pd
data['noofvisits'] = data.groupby(['A'])['date'].nunique()

      

When I just run the statement without assigning it to the DataFrame, I get the pandas series with the desired output. However, the above statement results in:

           A         date       noofvisits
0      M000833  2016-08-01         NaN         
1      M000833  2016-08-01         NaN
2      M000833  2016-08-02         NaN
3      M000833  2016-08-02         NaN
4      M000511  2016-08-05         NaN

      

Expected Result:

           A         date       noofvisits
0      M000833  2016-08-01         2         
1      M000833  2016-08-01         2
2      M000833  2016-08-02         2
3      M000833  2016-08-02         2
4      M000511  2016-08-05         1

      

What's wrong with this approach? Why does the noofvisits column result in NA values ​​and not counter values?

+3


source to share


1 answer


Use transform

to generate Series

with index aligned with original df:

In[32]:
df['noofvisits'] = df.groupby(['A'])['date'].transform('nunique')
df

Out[32]: 
             A        date  noofvisits
index                                 
0      M000833  2016-08-01           2
1      M000833  2016-08-01           2
2      M000833  2016-08-02           2
3      M000833  2016-08-02           2
4      M000511  2016-08-05           1

      

The problem with direct assignment is that you are group

ing on the column 'A'

, so it becomes the aggregation index groupby

, then you try to assign your df, but the indexes are not consistent with that on the column values NaN

.



Also, even if the index values ​​are the same, the shape is different anyway:

In[33]:
df.groupby(['A'])['date'].nunique()

Out[33]: 
A
M000511    1
M000833    2
Name: date, dtype: int64

      

+3


source







All Articles