Pandas Dataframe.describe (): What is the standard deviation?
Using the python Pandas library, the Dataframe.describe () function prints the standard deviation of the dataset. However, the documentation page does not indicate whether the standard deviation is an "uncorrected" standard deviation or a "corrected" standard deviation.
Can someone tell me which one it is returning?
source to share
This is the adjusted sample standard deviation.
You can verify this with a simple series and apply the formulas:
In [11]: s = pd.Series([1, 2])
In [12]: s.std()
Out[12]: 0.70710678118654757
In [13]: from math import sqrt
....: sqrt(0.5)
Out[13]: 0.7071067811865476
and the formula for the corrected sample standard deviation:
In [14]: sqrt(1./(len(s)-1) * ((s - s.mean()) ** 2).sum())
Out[14]: 0.7071067811865476
source to share
DataFrame.describe()
callsSeries.std()
to get the standard deviation. And as the documentation tells us ,
Returns the unbiased standard deviation along the requested axis.
Normalized to N-1 by default. This can be changed with the ddof argument
Thus, the standard deviation returned describe()
is essentially the "corrected sample standard deviation".
source to share