Numpy vs Matlab speed - arctan and power

In Python and Matlab, I wrote codes that generate a matrix and populate it with an index function. The execution time of Python code is approximately 20 times the execution time of Matlab code. Two functions with the same results are written in python, bWay()

based on this answer

Here's the complete Python code:

import numpy as np
from timeit import timeit

height = 1080
width = 1920
heightCm = 30
distanceCm = 70

centerY = height / 2 - 0.5;
centerX = width / 2 - 0.5;

constPart = height * heightCm / distanceCm

def aWay():
    M = np.empty([height, width], dtype=np.float64);
    for y in xrange(height):
        for x in xrange(width):
            M[y, x] = np.arctan(pow((pow((centerX - x), 2) + pow((centerY - y), 2)), 0.5) / constPart)

def bWay():
    M = np.frompyfunc(
        lambda y, x: np.arctan(pow((pow((centerX - x), 2) + pow((centerY - y), 2)), 0.5) / constPart), 2, 1## Heading ##
    ).outer(
        np.arange(height),
        np.arange(width),
    ).astype(np.float64)

      

and here's the complete Matlab code:

height = 1080;
width = 1920;
heightCm = 30;
distanceCm = 70;

centerY = height / 2 + 0.5;
centerX = width / 2 + 0.5;

constPart = height * heightCm / distanceCm;
M = zeros(height, width);
for y = 1 : height
    for x = 1 : width
        M(y, x) = atan(((centerX - x)^2 + (centerY - y)^2)^0.5 / constPart);
    end
end

      

Python runtime measured with timeit.timeit:

aWay() - 6.34s
bWay() - 6.68s

      

Matlab execution time measured with tic toc:

0.373s

      

To narrow it down, I measured the arctan

squaring and cycle time

Python:

>>> timeit('arctan(3)','from numpy import arctan', number = 1000000)
1.3365135641797679
>>> timeit('pow(3, 2)', number = 1000000)
0.11460829719908361
>>> timeit('power(3, 2)','from numpy import power', number = 1000000)
1.5427879383046275
>>> timeit('for x in xrange(10000000): pass', number = 1)
0.18364813832704385

      

Matlab:

tic
for i = 1 : 1000000
    atan(3);
end
toc
Elapsed time is 0.179802 seconds.
tic
for i = 1 : 1000000
    3^2;
end
toc
Elapsed time is 0.044160 seconds.
tic
for x = 1:10000000
end
toc
Elapsed time is 0.034853 seconds.

      

In all three cases, the Python code execution time was several times longer.

Is there anything you can do to improve the performance of this Python code?

+3


source to share


2 answers


I only focus on the Python part and how you can optimize it (never used MATLAB, sorry).

If I understand your code correctly, you can use:

def fastway():
    x, y = np.ogrid[:width, :height]  # you may need to swap "x" and "y" here.
    return np.arctan(np.hypot(centerX-x, centerY-y) / constPart)

      

It's vectorized and should be surprisingly fast.



%timeit fastway()
# 289 ms ± 9.62 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit aWay()
# 28.2 s ± 243 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit bWay()
# 29.3 s ± 790 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

      

In case you're wondering: np.hypot(x, y)

identical (x**2 + y**2)**0.5

. It is not necessarily faster, but shorter and in some cases the edges give more accurate results.

Also, if you ever need to work with scalars, you shouldn't use NumPy functions. NumPy functions have such a high overhead that the time it takes to process one element is identical to the time it takes to process one thousand elements, see for example my answer to the question "Performance with different vectorization method in numpy" .

+6


source


To make MSeifert's answer complete, here is the Matlab vector code:

height = 1080;
width = 1920;
heightCm = 30;
distanceCm = 70;

centerY = height / 2 + 0.5;
centerX = width / 2 + 0.5;

constPart = height * heightCm / distanceCm;
[x, y] = meshgrid(1:width, 1:height);
M = atan(hypot(centerX-x, centerY-y) / constPart);

      



On my machine, this takes 0.057 seconds, while double for loops takes 0.20 seconds.

On the same machine, python MSeifert's solution takes 0.082 seconds.

+4


source







All Articles