Pandas data group to calculate population standard deviation
I'm trying to use groupby and np.std to calculate the standard deviation, but it seems to be calculating the typical standard deviation (with degrees of freedom equal to 1).
Here's an example.
#create dataframe
>>> df = pd.DataFrame({'A':[1,1,2,2],'B':[1,2,1,2],'values':np.arange(10,30,5)})
>>> df
A B values
0 1 1 10
1 1 2 15
2 2 1 20
3 2 2 25
#calculate standard deviation using groupby
>>> df.groupby('A').agg(np.std)
B values
A
1 0.707107 3.535534
2 0.707107 3.535534
#Calculate using numpy (np.std)
>>> np.std([10,15],ddof=0)
2.5
>>> np.std([10,15],ddof=1)
3.5355339059327378
Is there a way to use std computation for population (ddof = 0) using the groupby operator? The records I am using are not (and not the example table above) are not samples, so I am only interested in population std variances.
+3
source to share