Pandas: how to group with counting with multiple levels in rows?

I have the following dataframe

|----|----|
| A  | B  |
| a1 | b1 |
| a2 | b1 |
| a1 | b2 |
| a2 | b3 |

      

I want to count B as A and get the following result:

|----|----|-------|
| A  | B  | Count |
| a1 | b1 |  1    |
|    | b2 |  1    |
|    | b3 |  NaN  |
| a2 | b1 |  1    |
|    | b2 |  NaN  |
|    | b3 |  1    |

      

I usually do it with df.groupby([B])[A].count()

, but in this case with an arbitrary pivot table it confuses me

Thanks in advance.

UPDT:

df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 20422 entries, 180 to 96430
Data columns (total 2 columns):
B    20422 non-null object
A             20422 non-null object
dtypes: object(2)
memory usage: 478.6+ KB

      

I am getting with df.groupby([B])[A].value_counts().unstack().stack(dropna=False).reset_index(name="Count")

:

|--|----|----|-------|
|  | A  | B  | Count |
|0 | a1 | b1 |  1    |
|1 | a1 | b2 |  1    |
|2 | a1 | b3 |  NaN  |
|3 | a2 | b1 |  1    |
|4 | a2 | b2 |  NaN  |
|5 | a2 | b3 |  1    |

      

+3


source to share


2 answers


1) One way is to group by "A"

and calculate the individual counts of the elements under "B"

with value_counts

. Then merge unstack

and stack

with dropna=False

to get what you want DF

:

df.groupby('A')['B'].value_counts().unstack().stack(dropna=False).reset_index(name="Count")

      

2) is pd.crosstab

also a good alternative if we replace the zero counting items with np.NaN

after stacking:

pd.crosstab(df['A'], df['B']).stack().replace({0:np.nan}).reset_index(name="Count")

      

Both approaches give:

enter image description here


edit1:



To have a grouped key "A"

will be displayed in a specific format (i.e. keep the first occurrence while replacing the rest with an empty string)

df_g = pd.crosstab(df['A'], df['B']).stack().replace({0:np.nan}).reset_index(name="Count")
df_g.loc[df_g.duplicated('A'), "A"] = ""

      

enter image description here

edit2:

If you want "A"

as a single useful cell being part of a multi-indexed one DF

:

df.groupby('A')['B'].value_counts().unstack().stack(dropna=False
                    ).reset_index(name="Count").set_index(['A', 'B'])

      

enter image description here

+3


source


You can group both columns and access the size of each group:

 df.groupby(['A', 'B']).size()

      

returns:



A   B 
a1  b1    1
    b2    1
a2  b1    1
    b3    1
dtype: int64

      

This will not give you NaN

for non-existing combinations.

+1


source







All Articles