Groupby and count the number of unique values ββ(Pandas)
I have a dataframe with two variables: ID
and outcome
. First I try groupby
ID
and count the number of unique values outcome
inside this ID
.
df
ID outcome
1 yes
1 yes
1 yes
2 no
2 yes
2 no
Expected Result:
ID yes no
1 3 0
2 1 2
My code df[['PID', 'outcome']].groupby('PID')['outcome'].nunique()
specifies the number of the most unique value, for example:
ID
1 2
2 2
But I need some calculations yes
and no
how can I achieve this? Thank!
source to share
How about pd.crosstab
?
In [1217]: pd.crosstab(df.ID, df.outcome)
Out[1217]:
outcome no yes
ID
1 0 3
2 2 1
source to share
Option pd.factorize
2+ np.bincount
It's confusing and painful ... but very fast.
fi, ui = pd.factorize(df.ID.values)
fo, uo = pd.factorize(df.outcome.values)
n, m = ui.size, uo.size
pd.DataFrame(
np.bincount(fi * m + fo, minlength=n * m).reshape(n, m),
pd.Index(ui, name='ID'), pd.Index(uo, name='outcome')
)
outcome yes no
ID
1 3 0
2 1 2
Option C
pd.get_dummies(d.ID).T.dot(pd.get_dummies(d.outcome))
no yes
1 0 3
2 2 1
Option IV .
df.groupby(['ID', 'outcome']).size().unstack(fill_value=0)
source to share
Group in a column ID
and then aggregate with value_counts
in a column outcome
. This will result in a series, so you will need to convert it back to the framework with .to_frame()
so that you can unlock yes / no (i.e. have them as columns). Then fill in the null values ββwith zero.
df_total = df.groupby('ID')['outcome'].value_counts().to_frame().unstack(fill_value=0)
df_total.columns = df_total.columns.droplevel()
>>> df_total
outcome no yes
ID
1 0 3
2 2 1
source to share