NumPy dot product gives two different results depending on dtype array
I have some grayscale image data (0-255). Depending on the NumPy type, I get different results for the dotted product. For example, x0
and x1
are the same image:
>>> x0
array([0, 0, 0, ..., 0, 0, 0], dtype=uint8)
>>> x1
array([0, 0, 0, ..., 0, 0, 0], dtype=uint8)
>>> (x0 == x1).all()
True
>>> np.dot(x0, x1)
133
>>> np.dot(x0.astype(np.float64), x1.astype(np.float64))
6750341.0
I know the second dot product is correct, because since they are the same image, the distance from the cosine should be 0:
>>> from scipy.spatial import distance
>>> distance.cosine(x0, x1)
0.99998029729164795
>>> distance.cosine(x0.astype(np.float64), x1.astype(np.float64))
0.0
Of course the dot product should work for integers. And for small arrays it does:
>>> v = np.array([1,2,3], dtype=np.uint8)
>>> v
array([1, 2, 3], dtype=uint8)
>>> np.dot(v, v)
14
>>> np.dot(v.astype(np.float64), v.astype(np.float64))
14.0
>>> distance.cosine(v, v)
0.0
What's happening. Why does dot-to-dot give me different answers depending on the dtype?
source to share
The data type is uint8
limited to 8 bits, so it can only represent the values 0, 1, ..., 255. Your dot product overflows the available range of values, so only the last 8 bits are stored. These last 8 bits contain the value 133. You can check this:
6750341 % (2 ** 8) == 133
# True
source to share