# Why is the average in this array greater than the maximum?

I found myself with a very confusing array in Python. Below is the output from iPython when I work with it (with the pylab flag):

``````In : x = np.load('x.npy')

In : x.shape
Out: (504000,)

In : x
Out:
array([ 98.20354462,  98.26583099,  98.26529694, ...,  98.20297241,
98.19876862,  98.29492188], dtype=float32)

In : min(x), mean(x), max(x)
Out: (97.950058, 98.689438, 98.329773)
```

```

I have no idea what's going on. Why does the mean () function provide what is obviously the wrong answer?

I don't even know where to start debugging this problem.

I am using Python 2.7.6.

I would like to share a file `.npy`

if needed.

+3

source to share

Probably because of a copied round-off error when calculating mean (). float32 relative precision is ~ 1e-7 and you have 500000 elements -> ~ 5% rounding when calculating the sum directly ().

The algorithm for computing sum () and mean () is more complex (pairwise summation) in the latest version of Numpy 1.9.0:

``````>>> import numpy
>>> numpy.__version__
'1.9.0'
>>> x = numpy.random.random(500000).astype("float32") + 300
>>> min(x), numpy.mean(x), max(x)
(300.0, 300.50024, 301.0)
```

```

At the same time, you can use a higher precision battery type: `numpy.mean(x, dtype=numpy.float64)`

+7

source

I've included a snippet from `np.mean.__doc__`

below. You should try using `np.mean(x, dtype=np.float64)`

.

``````-----
The arithmetic mean is the sum of the elements along the axis divided
by the number of elements.

Note that for floating-point input, the mean is computed using the
same precision the input has.  Depending on the input data, this can
cause the results to be inaccurate, especially for `float32` (see
example below).  Specifying a higher-precision accumulator using the
`dtype` keyword can alleviate this issue.

In single precision, `mean` can be inaccurate:

>>> a = np.zeros((2, 512*512), dtype=np.float32)
>>> a[0, :] = 1.0
>>> a[1, :] = 0.1
>>> np.mean(a)
0.546875

Computing the mean in float64 is more accurate:

>>> np.mean(a, dtype=np.float64)
0.55000000074505806
```

```
+3

source

All Articles