Python numpy.var returning wrong values

Question

Python numpy.var returning wrong values

I'm trying to do a simple variance calculation on a set of three numbers:

numpy.var([0.82159889, 0.26007962, 0.09818412])

which returns

0.09609366366174843

However, when calculating the variance, it should be

0.1441405

Seems like such a simple thing, but I haven't been able to find an answer yet.

+3

python numpy statistics

pajarraco 09 oct. 14 at 2:49 am

source to share

2 answers

np.var

calculates the variance of the population by default.

The sum of squared errors can be calculated as follows:

>>> vals = [0.82159889, 0.26007962, 0.09818412]
>>> mean = sum(vals)/3.0
>>> mean
0.3932875433333333
>>> sum((mean-val)**2 for val in vals)
0.2882809909852453
>>> sse = sum((mean-val)**2 for val in vals)

This is the population dispersion:

>>> sse/3 
0.09609366366174843
>>> np.var(vals)
0.09609366366174843

This is the sample variance:

>>> sse/(3-1)
0.14414049549262264
>>> np.var(vals, ddof=1)
0.14414049549262264

You can read more about the differences here.

0

Aaron hall 09 oct. 14 at 3:05

source to share

DSM · Accepted Answer · 2014-10-09T02:52:23+0000

The documentation explains:

ddof : int, optional
    "Delta Degrees of Freedom": the divisor used in the calculation is
    ``N - ddof``, where ``N`` represents the number of elements. By
    default `ddof` is zero.

And you have:

>>> numpy.var([0.82159889, 0.26007962, 0.09818412], ddof=0)
0.09609366366174843
>>> numpy.var([0.82159889, 0.26007962, 0.09818412], ddof=1)
0.14414049549262264

Both conventions are common enough that you always need to check which one is used by whatever package you use, in whatever language.

Python numpy.var returning wrong values

More articles: