Pandas data group to calculate population standard deviation

I'm trying to use groupby and np.std to calculate the standard deviation, but it seems to be calculating the typical standard deviation (with degrees of freedom equal to 1).

Here's an example.

#create dataframe
>>> df = pd.DataFrame({'A':[1,1,2,2],'B':[1,2,1,2],'values':np.arange(10,30,5)})
>>> df
   A  B  values
0  1  1      10
1  1  2      15
2  2  1      20
3  2  2      25

#calculate standard deviation using groupby
>>> df.groupby('A').agg(np.std)
      B    values
A                    
1  0.707107  3.535534
2  0.707107  3.535534

#Calculate using numpy (np.std)
>>> np.std([10,15],ddof=0)
2.5
>>> np.std([10,15],ddof=1)
3.5355339059327378

      

Is there a way to use std computation for population (ddof = 0) using the groupby operator? The records I am using are not (and not the example table above) are not samples, so I am only interested in population std variances.

+3


source to share


1 answer


You can pass additional arguments to np.std

in functions agg

:



In [202]:

df.groupby('A').agg(np.std, ddof=0)

Out[202]:
     B  values
A             
1  0.5     2.5
2  0.5     2.5

In [203]:

df.groupby('A').agg(np.std, ddof=1)

Out[203]:
          B    values
A                    
1  0.707107  3.535534
2  0.707107  3.535534

      

+9


source







All Articles