Why is the average in this array greater than the maximum?
I found myself with a very confusing array in Python. Below is the output from iPython when I work with it (with the pylab flag):
In [1]: x = np.load('x.npy')
In [2]: x.shape
Out[2]: (504000,)
In [3]: x
Out[3]:
array([ 98.20354462, 98.26583099, 98.26529694, ..., 98.20297241,
98.19876862, 98.29492188], dtype=float32)
In [4]: min(x), mean(x), max(x)
Out[4]: (97.950058, 98.689438, 98.329773)
I have no idea what's going on. Why does the mean () function provide what is obviously the wrong answer?
I don't even know where to start debugging this problem.
I am using Python 2.7.6.
I would like to share a file .npy
if needed.
Probably because of a copied round-off error when calculating mean (). float32 relative precision is ~ 1e-7 and you have 500000 elements -> ~ 5% rounding when calculating the sum directly ().
The algorithm for computing sum () and mean () is more complex (pairwise summation) in the latest version of Numpy 1.9.0:
>>> import numpy
>>> numpy.__version__
'1.9.0'
>>> x = numpy.random.random(500000).astype("float32") + 300
>>> min(x), numpy.mean(x), max(x)
(300.0, 300.50024, 301.0)
At the same time, you can use a higher precision battery type: numpy.mean(x, dtype=numpy.float64)
source to share
I've included a snippet from np.mean.__doc__
below. You should try using np.mean(x, dtype=np.float64)
.
-----
The arithmetic mean is the sum of the elements along the axis divided
by the number of elements.
Note that for floating-point input, the mean is computed using the
same precision the input has. Depending on the input data, this can
cause the results to be inaccurate, especially for `float32` (see
example below). Specifying a higher-precision accumulator using the
`dtype` keyword can alleviate this issue.
In single precision, `mean` can be inaccurate:
>>> a = np.zeros((2, 512*512), dtype=np.float32)
>>> a[0, :] = 1.0
>>> a[1, :] = 0.1
>>> np.mean(a)
0.546875
Computing the mean in float64 is more accurate:
>>> np.mean(a, dtype=np.float64)
0.55000000074505806
source to share