Count occurrences of specific values โโin a data frame, where all possible values โโare specified by a list
I have two categories A and B that can take 5 different states (values, names or categories) defined by the list ABCDE . Counting the presence of each state and its storage in a data frame is fairly straightforward. However, I also would like to see the resulting data frame includes zeros for the possible values, which are not met in the categories A or Bed and .
First, here's a data frame that matches the description:
IN 1]:
import pandas as pd
possibleValues = list('abcde')
df = pd.DataFrame({'Category A':list('abbc'), 'Category B':list('abcc')})
print(df)
Out [1]:
Category A Category B
0 a a
1 b b
2 b c
3 c c
I have tried different approaches with df.groupby(...).size()
and .count()
, combined with a list of possible values โโand category names in the list, with no success.
Here's the desired output:
Category A Category B
a 1 1
b 2 1
c 1 2
d 0 0
e 0 0
To take it one step further, I would also like to include a column with totals for each possible state across all categories:
Category A Category B Total
a 1 1 2
b 2 1 3
c 1 2 3
d 0 0 0
e 0 0 0
SO has many related questions and answers, but as far as I know, none of them offer a solution to this specific problem. Thanks for any suggestions!
PS
I want the solution to be tuned for the number of categories, the possible values, and the number of rows.
source to share
Up Need apply
+ value_counts
+ reindex
+ sum
:
cols = ['Category A','Category B']
df1 = df[cols].apply(pd.value_counts).reindex(possibleValues, fill_value=0)
df1['total'] = df1.sum(axis=1)
print (df1)
Category A Category B total
a 1 1 2
b 2 1 3
c 1 2 3
d 0 0 0
e 0 0 0
Another solution is to convert the columns to categorical and then the 0
values โโare added without reindex
:
cols = ['Category A','Category B']
df1 = df[cols].apply(lambda x: pd.Series.value_counts(x.astype('category',
categories=possibleValues)))
df1['total'] = df1.sum(axis=1)
print (df1)
Category A Category B total
a 1 1 2
b 2 1 3
c 1 2 3
d 0 0 0
e 0 0 0
source to share