Pandas group by aggregate using division

Question

Pandas group by aggregate using division

I am wondering how to aggregate data in a grouped pandas dataframe using a function where I take into account the value stored in some column of the dataframe. This would be useful in operations where there is an order of operations, such as division.

For example, I have:

In [8]: df
Out[8]: 
  class cat  xer
0     a   1    2
1     b   1    4
2     c   1    9
3     a   2    6
4     b   2    8
5     c   2    3

I want to group by class and for each class

divide the value xer

corresponding to cat == 1

those for cat == 2

. In other words, the entries in the final output should be:

  class    div
0     a   0.33  (i.e. 2/6)
1     b    0.5  (i.e. 4/8)
2     c      3  (i.e. 9/3)

Can this be done with groupby? I can't figure out how to do this without manually iterating through each class, and even so it is not clean and addicting.

+3

python pandas aggregate

crackedegg May 08 '15 at 22:14

source to share

4 answers

Don't do anything too smart:

In [11]: one = df[df["cat"] == 1].set_index("class")["xer"]

In [12]: two = df[df["cat"] == 2].set_index("class")["xer"]

In [13]: one / two
Out[13]:
class
a    0.333333
b    0.500000
c    3.000000
Name: xer, dtype: float64

+2

Andy Hayden May 08 '15 at 22:37

source to share

This is step by step:

# get cat==1 and cat==2 merged by class
grouped = df[df.cat==1].merge(df[df.cat==2], on='class')
# calculate div
grouped['div'] = grouped.xer_x / grouped.xer_y
# return the final dataframe
grouped[['class', 'div']]

which gives:

  class       div
0     a  0.333333
1     b  0.500000
2     c  3.000000

0

miraculixx May 08 '15 at 22:47

source to share

You can change the data to make it easier to see:

df2 = df.set_index(['class', 'cat']).unstack()

>>> df2
       xer   
cat      1  2
class        
a        2  6
b        4  8
c        9  3

Then you can do the following to get the desired result:

>>> df2.iloc[:,0].div(df2.iloc[:, 1])

class
a        0.333333
b        0.500000
c        3.000000
Name: (xer, 1), dtype: float64

0

Alexander May 08 '15 at 22:53

source to share

Jon Clements · Accepted Answer · 2015-05-08T22:39:21+0000

Given yours DataFrame

, you can use the following:

df.groupby('class').agg({'xer': lambda L: reduce(pd.np.divide, L)})

Which gives you:

            xer
class          
a      0.333333
b      0.500000
c      3.000000

This requires> 2 per group (if needed), but you can make sure your df is sorted with cat

first to make sure they appear in the correct order.

Pandas group by aggregate using division

More articles: