Optimizing python code
I wrote the following function to estimate orientation from a 3-axis accelerometer signal (X, Y, Z)
X.shape
Out[4]: (180000L,)
Y.shape
Out[4]: (180000L,)
Z.shape
Out[4]: (180000L,)
def estimate_orientation(self,X,Y,Z):
sigIn=np.array([X,Y,Z]).T
N=len(sigIn)
sigOut=np.empty(shape=(N,3))
sigOut[sigOut==0]=None
i=0
while i<N:
sigOut[i,:] = np.arccos(sigIn[i,:]/np.linalg.norm(sigIn[i,:]))*180/math.pi
i=i+1
return sigOut
It takes quite a long time (~ 2.2 seconds) to execute this function with a signal of 180,000 samples ... I know it is not written in the "pythonic way" ... Could you help me optimize the execution time?
Thank!
source to share
Initial approach
One approach following usage broadcasting
would be like this:
np.arccos(sigIn/np.linalg.norm(sigIn,axis=1,keepdims=1))*180/np.pi
Further optimization - I
We could use np.einsum
to replace the part np.linalg.norm
. Thus:
np.linalg.norm(sigIn,axis=1,keepdims=1)
can be replaced with:
np.sqrt(np.einsum('ij,ij->i',sigIn,sigIn))[:,None]
Further Optimization - II
Further enhancement can be caused by numexpr
module , which works great with huge arrays and with operations on functions.In trigonometrical
our case, that would be arcccos
. So, we will use the part einsum
that is used in the previous optimization section and then use arccos
from numexpr
on it.
Thus, the implementation will look something like this:
import numexpr as ne pi_val = np.pi s = np.sqrt(np.einsum('ij,ij->i',signIn,signIn))[:,None] out = ne.evaluate('arccos(signIn/s)*180/pi_val')
Runtime test
Approaches -
def original_app(sigIn):
N=len(sigIn)
sigOut=np.empty(shape=(N,3))
sigOut[sigOut==0]=None
i=0
while i<N:
sigOut[i,:] = np.arccos(sigIn[i,:]/np.linalg.norm(sigIn[i,:]))*180/math.pi
i=i+1
return sigOut
def broadcasting_app(signIn):
s = np.linalg.norm(signIn,axis=1,keepdims=1)
return np.arccos(signIn/s)*180/np.pi
def einsum_app(signIn):
s = np.sqrt(np.einsum('ij,ij->i',signIn,signIn))[:,None]
return np.arccos(signIn/s)*180/np.pi
def numexpr_app(signIn):
pi_val = np.pi
s = np.sqrt(np.einsum('ij,ij->i',signIn,signIn))[:,None]
return ne.evaluate('arccos(signIn/s)*180/pi_val')
Timing -
In [115]: a = np.random.rand(180000,3)
In [116]: %timeit original_app(a)
...: %timeit broadcasting_app(a)
...: %timeit einsum_app(a)
...: %timeit numexpr_app(a)
...:
1 loops, best of 3: 1.38 s per loop
100 loops, best of 3: 15.4 ms per loop
100 loops, best of 3: 13.3 ms per loop
100 loops, best of 3: 4.85 ms per loop
In [117]: 1380/4.85 # Speedup number
Out[117]: 284.5360824742268
280x
acceleration there!
source to share