Pandas, group by count and add score to original dataframe?

Question

Pandas, group by count and add score to original dataframe?

When trying to count rows with a similar "look" in a data frame:

import pandas as pd

items = [('aaa','aaa text 1'), ('aaa','aaa text 2'), ('aaa','aaa text 3'),
         ('bb', 'bb text 1'), ('bb', 'bb text 2'), ('bb', 'bb text 3'), 
         ('bb', 'bb text 4'),
         ('cccc','cccc text 1'), ('cccc','cccc text 2'),
         ('dd', 'dd text 1'),
         ('e', 'e text 1'),
         ('fff', 'fff text 1'),
        ]

df = pd.DataFrame(items, columns=['kind', 'msg'])
df

    kind    msg
0   aaa     aaa text 1
1   aaa     aaa text 2
2   aaa     aaa text 3
3   bb      bb text 1
4   bb      bb text 2
5   bb      bb text 3
6   bb      bb text 4
7   cccc    cccc text 1
8   cccc    cccc text 2
9   dd      dd text 1
10  e       e text 1
11  fff     fff text 1

This code works:

df = df[['kind']].groupby(['kind'])['kind'] \
                         .count() \
                         .reset_index(name='count') \
                         .sort_values(['count'], ascending=False) \
                         .head(5)

df

Result:

    kind      count
    0   aaa   1
    1   bb    1
    2   cccc  1
    3   dd    1
    4   e     1

But how can you get a data frame with all columns as in the original plus column? So the result should have columns "view", "msg", "count" in that order?

Also, how to sort the resulting resulting dataframe in descending order of count?

+3

pandas dataframe

dokondr Jul 27. 17 at 9:42 am

source to share

1 answer

MaxU · Accepted Answer · 2017-07-27T09:48:34+0000

IIUC

In [247]: df['count'] = df.groupby('kind').transform('count')

In [248]: df
Out[248]:
    kind          msg  count
0    aaa   aaa text 1      3
1    aaa   aaa text 2      3
2    aaa   aaa text 3      3
3     bb    bb text 1      4
4     bb    bb text 2      4
5     bb    bb text 3      4
6     bb    bb text 4      4
7   cccc  cccc text 1      2
8   cccc  cccc text 2      2
9     dd    dd text 1      1
10     e     e text 1      1
11   fff   fff text 1      1

Sorting:

In [249]: df.sort_values('count', ascending=False)
Out[249]:
    kind          msg  count
3     bb    bb text 1      4
4     bb    bb text 2      4
5     bb    bb text 3      4
6     bb    bb text 4      4
0    aaa   aaa text 1      3
1    aaa   aaa text 2      3
2    aaa   aaa text 3      3
7   cccc  cccc text 1      2
8   cccc  cccc text 2      2
9     dd    dd text 1      1
10     e     e text 1      1
11   fff   fff text 1      1

Pandas, group by count and add score to original dataframe?

More articles: