Numpy vs Matlab speed - arctan and power
In Python and Matlab, I wrote codes that generate a matrix and populate it with an index function. The execution time of Python code is approximately 20 times the execution time of Matlab code. Two functions with the same results are written in python, bWay()
based on this answer
Here's the complete Python code:
import numpy as np
from timeit import timeit
height = 1080
width = 1920
heightCm = 30
distanceCm = 70
centerY = height / 2 - 0.5;
centerX = width / 2 - 0.5;
constPart = height * heightCm / distanceCm
def aWay():
M = np.empty([height, width], dtype=np.float64);
for y in xrange(height):
for x in xrange(width):
M[y, x] = np.arctan(pow((pow((centerX - x), 2) + pow((centerY - y), 2)), 0.5) / constPart)
def bWay():
M = np.frompyfunc(
lambda y, x: np.arctan(pow((pow((centerX - x), 2) + pow((centerY - y), 2)), 0.5) / constPart), 2, 1## Heading ##
).outer(
np.arange(height),
np.arange(width),
).astype(np.float64)
and here's the complete Matlab code:
height = 1080; width = 1920; heightCm = 30; distanceCm = 70; centerY = height / 2 + 0.5; centerX = width / 2 + 0.5; constPart = height * heightCm / distanceCm; M = zeros(height, width); for y = 1 : height for x = 1 : width M(y, x) = atan(((centerX - x)^2 + (centerY - y)^2)^0.5 / constPart); end end
Python runtime measured with timeit.timeit:
aWay() - 6.34s
bWay() - 6.68s
Matlab execution time measured with tic toc:
0.373s
To narrow it down, I measured the arctan
squaring and cycle time
Python:
>>> timeit('arctan(3)','from numpy import arctan', number = 1000000)
1.3365135641797679
>>> timeit('pow(3, 2)', number = 1000000)
0.11460829719908361
>>> timeit('power(3, 2)','from numpy import power', number = 1000000)
1.5427879383046275
>>> timeit('for x in xrange(10000000): pass', number = 1)
0.18364813832704385
Matlab:
tic
for i = 1 : 1000000
atan(3);
end
toc
Elapsed time is 0.179802 seconds.
tic
for i = 1 : 1000000
3^2;
end
toc
Elapsed time is 0.044160 seconds.
tic
for x = 1:10000000
end
toc
Elapsed time is 0.034853 seconds.
In all three cases, the Python code execution time was several times longer.
Is there anything you can do to improve the performance of this Python code?
source to share
I only focus on the Python part and how you can optimize it (never used MATLAB, sorry).
If I understand your code correctly, you can use:
def fastway():
x, y = np.ogrid[:width, :height] # you may need to swap "x" and "y" here.
return np.arctan(np.hypot(centerX-x, centerY-y) / constPart)
It's vectorized and should be surprisingly fast.
%timeit fastway()
# 289 ms ± 9.62 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit aWay()
# 28.2 s ± 243 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit bWay()
# 29.3 s ± 790 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In case you're wondering: np.hypot(x, y)
identical (x**2 + y**2)**0.5
. It is not necessarily faster, but shorter and in some cases the edges give more accurate results.
Also, if you ever need to work with scalars, you shouldn't use NumPy functions. NumPy functions have such a high overhead that the time it takes to process one element is identical to the time it takes to process one thousand elements, see for example my answer to the question "Performance with different vectorization method in numpy" .
source to share
To make MSeifert's answer complete, here is the Matlab vector code:
height = 1080; width = 1920; heightCm = 30; distanceCm = 70; centerY = height / 2 + 0.5; centerX = width / 2 + 0.5; constPart = height * heightCm / distanceCm; [x, y] = meshgrid(1:width, 1:height); M = atan(hypot(centerX-x, centerY-y) / constPart);
On my machine, this takes 0.057 seconds, while double for loops takes 0.20 seconds.
On the same machine, python MSeifert's solution takes 0.082 seconds.
source to share