Adding new column to pandas Result DataFrame in NaN
I have a pandas DataFrame data
with the following transaction data:
A date
0 M000833 2016-08-01
1 M000833 2016-08-01
2 M000833 2016-08-02
3 M000833 2016-08-02
4 M000511 2016-08-05
I need a new column with the number of visits (number of visits per day should be treated as 1) for each consumer.
So I tried this:
import pandas as pd
data['noofvisits'] = data.groupby(['A'])['date'].nunique()
When I just run the statement without assigning it to the DataFrame, I get the pandas series with the desired output. However, the above statement results in:
A date noofvisits
0 M000833 2016-08-01 NaN
1 M000833 2016-08-01 NaN
2 M000833 2016-08-02 NaN
3 M000833 2016-08-02 NaN
4 M000511 2016-08-05 NaN
Expected Result:
A date noofvisits
0 M000833 2016-08-01 2
1 M000833 2016-08-01 2
2 M000833 2016-08-02 2
3 M000833 2016-08-02 2
4 M000511 2016-08-05 1
What's wrong with this approach? Why does the noofvisits column result in NA values ββand not counter values?
source to share
Use transform
to generate Series
with index aligned with original df:
In[32]:
df['noofvisits'] = df.groupby(['A'])['date'].transform('nunique')
df
Out[32]:
A date noofvisits
index
0 M000833 2016-08-01 2
1 M000833 2016-08-01 2
2 M000833 2016-08-02 2
3 M000833 2016-08-02 2
4 M000511 2016-08-05 1
The problem with direct assignment is that you are group
ing on the column 'A'
, so it becomes the aggregation index groupby
, then you try to assign your df, but the indexes are not consistent with that on the column values NaN
.
Also, even if the index values ββare the same, the shape is different anyway:
In[33]:
df.groupby(['A'])['date'].nunique()
Out[33]:
A
M000511 1
M000833 2
Name: date, dtype: int64
source to share