NumPy dot product gives two different results depending on dtype array

I have some grayscale image data (0-255). Depending on the NumPy type, I get different results for the dotted product. For example, x0

and x1

are the same image:

>>> x0
array([0, 0, 0, ..., 0, 0, 0], dtype=uint8)
>>> x1
array([0, 0, 0, ..., 0, 0, 0], dtype=uint8)
>>> (x0 == x1).all()
True
>>> np.dot(x0, x1)
133
>>> np.dot(x0.astype(np.float64), x1.astype(np.float64))
6750341.0

      

I know the second dot product is correct, because since they are the same image, the distance from the cosine should be 0:

>>> from scipy.spatial import distance
>>> distance.cosine(x0, x1)
0.99998029729164795
>>> distance.cosine(x0.astype(np.float64), x1.astype(np.float64))
0.0

      

Of course the dot product should work for integers. And for small arrays it does:

>>> v = np.array([1,2,3], dtype=np.uint8)
>>> v
array([1, 2, 3], dtype=uint8)
>>> np.dot(v, v)
14
>>> np.dot(v.astype(np.float64), v.astype(np.float64))
14.0
>>> distance.cosine(v, v)
0.0

      

What's happening. Why does dot-to-dot give me different answers depending on the dtype?

+3


source to share


1 answer


The data type is uint8

limited to 8 bits, so it can only represent the values ​​0, 1, ..., 255. Your dot product overflows the available range of values, so only the last 8 bits are stored. These last 8 bits contain the value 133. You can check this:



6750341 % (2 ** 8) == 133
# True

      

+8


source







All Articles