Pandas group by aggregate using division

I am wondering how to aggregate data in a grouped pandas dataframe using a function where I take into account the value stored in some column of the dataframe. This would be useful in operations where there is an order of operations, such as division.

For example, I have:

In [8]: df
Out[8]: 
  class cat  xer
0     a   1    2
1     b   1    4
2     c   1    9
3     a   2    6
4     b   2    8
5     c   2    3

      

I want to group by class and for each class

divide the value xer

corresponding to cat == 1

those for cat == 2

. In other words, the entries in the final output should be:

  class    div
0     a   0.33  (i.e. 2/6)
1     b    0.5  (i.e. 4/8)
2     c      3  (i.e. 9/3)

      

Can this be done with groupby? I can't figure out how to do this without manually iterating through each class, and even so it is not clean and addicting.

+3


source to share


4 answers


Given yours DataFrame

, you can use the following:

df.groupby('class').agg({'xer': lambda L: reduce(pd.np.divide, L)})

      

Which gives you:



            xer
class          
a      0.333333
b      0.500000
c      3.000000

      

This requires> 2 per group (if needed), but you can make sure your df is sorted with cat

first to make sure they appear in the correct order.

+1


source


Don't do anything too smart:



In [11]: one = df[df["cat"] == 1].set_index("class")["xer"]

In [12]: two = df[df["cat"] == 2].set_index("class")["xer"]

In [13]: one / two
Out[13]:
class
a    0.333333
b    0.500000
c    3.000000
Name: xer, dtype: float64

      

+2


source


This is step by step:

# get cat==1 and cat==2 merged by class
grouped = df[df.cat==1].merge(df[df.cat==2], on='class')
# calculate div
grouped['div'] = grouped.xer_x / grouped.xer_y
# return the final dataframe
grouped[['class', 'div']]

      

which gives:

  class       div
0     a  0.333333
1     b  0.500000
2     c  3.000000

      

0


source


You can change the data to make it easier to see:

df2 = df.set_index(['class', 'cat']).unstack()

>>> df2
       xer   
cat      1  2
class        
a        2  6
b        4  8
c        9  3

      

Then you can do the following to get the desired result:

>>> df2.iloc[:,0].div(df2.iloc[:, 1])

class
a        0.333333
b        0.500000
c        3.000000
Name: (xer, 1), dtype: float64

      

0


source







All Articles