Group using aggregation function as new field in pandas

Question

Group using aggregation function as new field in pandas

If I do the following group in mysql table

SELECT col1, count(col2) * count(distinct(col3)) as agg_col
FROM my_table
GROUP BY col1

i get a table with 3 columns

col1 col2 agg_col

How can I do the same on a pandas frame?

Suppose I have a Dataframe that has three columns col1 col2 and col3. Group operation

grouped = my_df.groupby('col1')

will return data grouped by col1

Besides

agg_col_series = grouped.col2.size() * grouped.col3.nunique()

will return an aggregated column equivalent to the one specified in the sql query. But how can I add this to the grouped data block?

+3

python pandas mysql

Apostolos 01 jul. 17 at 12:42

source to share

2 answers

Let's use groupby

with a lambda function that uses size

and nunique

then a rename

series in 'agg_col' and reset_index

to get a dataframe .

import pandas as pd
import numpy as np

np.random.seed(443)
df = pd.DataFrame({'Col1':np.random.choice(['A','B','C'],50),
                   'Col2':np.random.randint(1000,9999,50),
                   'Col3':np.random.choice(['A','B','C','D','E','F','G','H','I','J'],50)})

df_out = df.groupby('Col1').apply(lambda x: x.Col2.size * x.Col3.nunique()).rename('agg_col').reset_index()

Output:

  Col1  agg_col
0    A      120
1    B       96
2    C      190

+1

Scott boston 01 jul. 17 at 15:02

source to share

NickBraunagel · Accepted Answer · 2017-07-01T15:07:52+0000

We'll need to make sure your data is correct, but I think you need to just reset your index agg_col_series

:

agg_col_series.reset_index(name='agg_col')

Complete example with dummy data:

import random
import pandas as pd

col1 = [random.randint(1,5) for x in range(1,1000)]
col2 = [random.randint(1,100) for x in range(1,1000)]
col3 = [random.randint(1,100) for x in range(1,1000)]

df = pd.DataFrame(data={
        'col1': col1,
        'col2': col2,
        'col3': col3,
    })

grouped = df.groupby('col1')
agg_col_series = grouped.col2.size() * grouped.col3.nunique()

print agg_col_series.reset_index(name='agg_col')

index   col1  agg_col
0       1    15566
1       2    20056
2       3    17313
3       4    17304
4       5    16380

Group using aggregation function as new field in pandas

More articles: