Pandas Dataframe.describe (): What is the standard deviation?

Using the python Pandas library, the Dataframe.describe () function prints the standard deviation of the dataset. However, the documentation page does not indicate whether the standard deviation is an "uncorrected" standard deviation or a "corrected" standard deviation.

Can someone tell me which one it is returning?

+3


source to share


2 answers


This is the adjusted sample standard deviation.
You can verify this with a simple series and apply the formulas:

In [11]: s = pd.Series([1, 2])

In [12]: s.std()
Out[12]: 0.70710678118654757

In [13]: from math import sqrt
   ....:  sqrt(0.5)
Out[13]: 0.7071067811865476

      



and the formula for the corrected sample standard deviation:

In [14]: sqrt(1./(len(s)-1) * ((s - s.mean()) ** 2).sum())
Out[14]: 0.7071067811865476

      

+5


source


DataFrame.describe()

callsSeries.std()

to get the standard deviation. And as the documentation tells us ,

Returns the unbiased standard deviation along the requested axis.

Normalized to N-1 by default. This can be changed with the ddof argument



Thus, the standard deviation returned describe()

is essentially the "corrected sample standard deviation".

+3


source







All Articles