matrix multiplication: (n-by-m) * (m-by-k) and (m-by-k) * (k-by-n) have very different speeds
I have the following Python code:
import numpy
import time
A = numpy.random.random((10,60000))
B = numpy.random.random((60000,785))
C = numpy.random.random((785,10))
t = time.time()
D =
print "%.2f s" % (time.time() - t)
t = time.time()
E =
print "%.2f s" % (time.time() - t)
I think the two matrix multiplications A * B and B * C should take approximately the same amount of time, since both multiplications involve 10 * 60000 * 785 multiplications.
However, I have very different timings on different machines. On my laptop (Windows 7, 2.40 GHz processor, 8 GB memory, Python 2.7, Numpy 1.7.1) I got:
0.21 s
0.21 s
which is normal. On a cluster machine (Linux CentOS 5.6, 2.66 GHz, 16 GB memory, Python 2.7.3, Numpy 1.8.1) I got:
6.77 s
1.10 s
where A * B is much slower than B * C.
Can anyone explain why the two multiplications take different time intervals? I'm not sure which configurations are relevant, but I'll try to provide any information you need.
source to share
(too long for comments, not for an answer)
If you run the following, are you getting big performance differences?
#!/usr/bin/env python3.4
import numpy
import time
A = numpy.random.random((10,60000))
B = numpy.random.random((60000,785))
C = numpy.random.random((785,10))
t = time.time()
D =
print("%.2f s" % (time.time() - t))
t = time.time()
D = numpy.transpose(B).dot(numpy.transpose(A))
print("%.2f s" % (time.time() - t))
t = time.time()
D =
print("%.2f s" % (time.time() - t))
t = time.time()
D = numpy.transpose(C).dot(numpy.transpose(B))
print("%.2f s" % (time.time() - t))
When I run this I get
0.21 s
0.22 s
0.44 s
0.22 s
This is a strong indication of the differences in memory access patterns.
source to share