Pandas: how to group with counting with multiple levels in rows?
I have the following dataframe
|----|----|
| A | B |
| a1 | b1 |
| a2 | b1 |
| a1 | b2 |
| a2 | b3 |
I want to count B as A and get the following result:
|----|----|-------|
| A | B | Count |
| a1 | b1 | 1 |
| | b2 | 1 |
| | b3 | NaN |
| a2 | b1 | 1 |
| | b2 | NaN |
| | b3 | 1 |
I usually do it with df.groupby([B])[A].count()
, but in this case with an arbitrary pivot table it confuses me
Thanks in advance.
UPDT:
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 20422 entries, 180 to 96430
Data columns (total 2 columns):
B 20422 non-null object
A 20422 non-null object
dtypes: object(2)
memory usage: 478.6+ KB
I am getting with df.groupby([B])[A].value_counts().unstack().stack(dropna=False).reset_index(name="Count")
:
|--|----|----|-------|
| | A | B | Count |
|0 | a1 | b1 | 1 |
|1 | a1 | b2 | 1 |
|2 | a1 | b3 | NaN |
|3 | a2 | b1 | 1 |
|4 | a2 | b2 | NaN |
|5 | a2 | b3 | 1 |
source to share
1) One way is to group by "A"
and calculate the individual counts of the elements under "B"
with value_counts
. Then merge unstack
and stack
with dropna=False
to get what you want DF
:
df.groupby('A')['B'].value_counts().unstack().stack(dropna=False).reset_index(name="Count")
2) is pd.crosstab
also a good alternative if we replace the zero counting items with np.NaN
after stacking:
pd.crosstab(df['A'], df['B']).stack().replace({0:np.nan}).reset_index(name="Count")
Both approaches give:
edit1:
To have a grouped key "A"
will be displayed in a specific format (i.e. keep the first occurrence while replacing the rest with an empty string)
df_g = pd.crosstab(df['A'], df['B']).stack().replace({0:np.nan}).reset_index(name="Count")
df_g.loc[df_g.duplicated('A'), "A"] = ""
edit2:
If you want "A"
as a single useful cell being part of a multi-indexed one DF
:
df.groupby('A')['B'].value_counts().unstack().stack(dropna=False
).reset_index(name="Count").set_index(['A', 'B'])
source to share