Pandas group by aggregate using division
I am wondering how to aggregate data in a grouped pandas dataframe using a function where I take into account the value stored in some column of the dataframe. This would be useful in operations where there is an order of operations, such as division.
For example, I have:
In [8]: df
Out[8]:
class cat xer
0 a 1 2
1 b 1 4
2 c 1 9
3 a 2 6
4 b 2 8
5 c 2 3
I want to group by class and for each class
divide the value xer
corresponding to cat == 1
those for cat == 2
. In other words, the entries in the final output should be:
class div
0 a 0.33 (i.e. 2/6)
1 b 0.5 (i.e. 4/8)
2 c 3 (i.e. 9/3)
Can this be done with groupby? I can't figure out how to do this without manually iterating through each class, and even so it is not clean and addicting.
source to share
Given yours DataFrame
, you can use the following:
df.groupby('class').agg({'xer': lambda L: reduce(pd.np.divide, L)})
Which gives you:
xer
class
a 0.333333
b 0.500000
c 3.000000
This requires> 2 per group (if needed), but you can make sure your df is sorted with cat
first to make sure they appear in the correct order.
source to share
This is step by step:
# get cat==1 and cat==2 merged by class
grouped = df[df.cat==1].merge(df[df.cat==2], on='class')
# calculate div
grouped['div'] = grouped.xer_x / grouped.xer_y
# return the final dataframe
grouped[['class', 'div']]
which gives:
class div
0 a 0.333333
1 b 0.500000
2 c 3.000000
source to share
You can change the data to make it easier to see:
df2 = df.set_index(['class', 'cat']).unstack()
>>> df2
xer
cat 1 2
class
a 2 6
b 4 8
c 9 3
Then you can do the following to get the desired result:
>>> df2.iloc[:,0].div(df2.iloc[:, 1])
class
a 0.333333
b 0.500000
c 3.000000
Name: (xer, 1), dtype: float64
source to share