Pandas data group to calculate population standard deviation

Question

Pandas data group to calculate population standard deviation

I'm trying to use groupby and np.std to calculate the standard deviation, but it seems to be calculating the typical standard deviation (with degrees of freedom equal to 1).

Here's an example.

#create dataframe
>>> df = pd.DataFrame({'A':[1,1,2,2],'B':[1,2,1,2],'values':np.arange(10,30,5)})
>>> df
   A  B  values
0  1  1      10
1  1  2      15
2  2  1      20
3  2  2      25

#calculate standard deviation using groupby
>>> df.groupby('A').agg(np.std)
      B    values
A                    
1  0.707107  3.535534
2  0.707107  3.535534

#Calculate using numpy (np.std)
>>> np.std([10,15],ddof=0)
2.5
>>> np.std([10,15],ddof=1)
3.5355339059327378

Is there a way to use std computation for population (ddof = 0) using the groupby operator? The records I am using are not (and not the example table above) are not samples, so I am only interested in population std variances.

+3

python numpy pandas statistics

neelshiv Sep 18 14 at 14:19

source to share

1 answer

EdChum · Accepted Answer · 2014-09-18T14:21:52+0000

You can pass additional arguments to np.std

in functions agg

:

In [202]:

df.groupby('A').agg(np.std, ddof=0)

Out[202]:
     B  values
A             
1  0.5     2.5
2  0.5     2.5

In [203]:

df.groupby('A').agg(np.std, ddof=1)

Out[203]:
          B    values
A                    
1  0.707107  3.535534
2  0.707107  3.535534

Pandas data group to calculate population standard deviation

More articles: