Vectorizing python loop over numpy array

Question

Vectorizing python loop over numpy array

I need to speed up this loop as it is very slow. But I don't know how to vectorize it as the result of one value depends on the result of the previous value. Any suggestions?

import numpy as np

sig = np.random.randn(44100)
alpha = .9887
beta = .999

out = np.zeros_like(sig)

for n in range(1, len(sig)):
    if np.abs(sig[n]) >= out[n-1]:
        out[n] = alpha * out[n-1] + (1 - alpha) * np.abs( sig[n] )
    else:
        out[n] = beta * out[n-1]

+3

python numpy

Christopher brown Sep 30 14 at 18:06

source to share

2 answers

A Numba just-in-time compiler has to deal with indexing overhead that you deal with quite well when compiling a function to native code at first execution time:

In [1]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:import numpy as np
:
:sig = np.random.randn(44100)
:alpha = .9887
:beta = .999
:
:def nonvectorized(sig):
:    out = np.zeros_like(sig)
:
:    for n in range(1, len(sig)):
:        if np.abs(sig[n]) >= out[n-1]:
:            out[n] = alpha * out[n-1] + (1 - alpha) * np.abs( sig[n] )
:        else:
:            out[n] = beta * out[n-1]
:    return out
:--

In [2]: nonvectorized(sig)
Out[2]: 
array([ 0.        ,  0.01862503,  0.04124917, ...,  1.2979579 ,
        1.304247  ,  1.30294275])

In [3]: %timeit nonvectorized(sig)
10 loops, best of 3: 80.2 ms per loop

In [4]: from numba import jit

In [5]: vectorized = jit(nonvectorized)

In [6]: np.allclose(vectorized(sig), nonvectorized(sig))
Out[6]: True

In [7]: %timeit vectorized(sig)
1000 loops, best of 3: 249 µs per loop

EDIT: As pointed out in the comment, adding jit tests. jit(nonvectorized)

creates a lightweight wrapper and hence a cheap operation.

In [8]: %timeit jit(nonvectorized)
10000 loops, best of 3: 45.3 µs per loop

The function itself is compiled on first execution (hence just in time), which takes a while, but probably not that much:

In [9]: %timeit jit(nonvectorized)(sig)
10 loops, best of 3: 169 ms per loop

+3

immerrr Sep 30 14 at 18:35

source to share

user3666197 · Accepted Answer · 2014-09-30T21:31:45+0000

Low potential for vectorization by code "with feedback"

most of your "vectorization" parallelism is out of the picture after the dependency analysis. (The JIT compiler cannot vectorize "against" such a dependency barrier either)

you can pre-compute some reusable values in vectorized form, but there is no direct python syntax (no external JIT compiler bypass) to arrange for loopback computation into CPU register vector aligned parallel computation:

from zmq import Stopwatch    # ok to use pyzmq 2.11 for [usec] .Stopwatch()
aStopWATCH =    Stopwatch()  # a performance measurement .Stopwatch() instance

sig    = np.abs(sig)         # self-destructive calc/assign avoids memalloc-OPs
aConst = ( 1 - alpha )       # avoids many repetitive SUB(s) in the loop

for thisPtr in range( 1, len( sig ) ): # FORWARD-SHIFTING-DEPENDENCE LOOP:
    prevPtr = thisPtr - 1              # prevPtr->"previous" TimeSlice in out[] ( re-used 2 x len(sig) times )
    if sig[thisPtr] < out[prevPtr]:                                    # 1st re-use
       out[thisPtr] = out[prevPtr] * beta                              # 2nd
    else:
       out[thisPtr] = out[prevPtr] * alpha + ( aConst * sig[thisPtr] ) # 2nd

A good example of vectorized acceleration can be seen in cases where the calculation strategy can be parallelized / passed along the 1D, 2D, or even 3D structure of numpy's own matrix. For about 100x acceleration see Accelerate processing of RGBA-2D matrix to Vectorized code for processing PNG images (OpenGL shader pipeline)

Performance increased by another 3x

Even this simple version of the code python

has increased the speed by more than 2.8x (right now, that is, without installation to allow the use of a dedicated JIT-optimizing compiler):

>>> def aForwardShiftingDependenceLOOP(): # proposed code-revision
...     aStopWATCH.start()                # ||||||||||||||||||.start
...     for thisPtr in range( 1, len( sig ) ):
...         #        |vvvvvvv|------------# FORWARD-SHIFTING-LOOP DEPENDENCE
...         prevPtr = thisPtr - 1  #|vvvvvvv|--STEP-SHIFTING avoids Numpy syntax
...         if ( sig[ thisPtr] < out[prevPtr] ):
...             out[  thisPtr] = out[prevPtr] * beta
...         else:
...             out[  thisPtr] = out[prevPtr] * alpha + ( aConst * sig[thisPtr] )
...     usec = aStopWATCH.stop()          # ||||||||||||||||||.stop
...     print usec, " [usec]"

>>> aForwardShiftingDependenceLOOP()
57593  [usec]
57879  [usec]
58085  [usec]

>>> def anOriginalForLOOP():
...     aStopWATCH.start()
...     for n in range( 1, len( sig ) ):
...         if ( np.abs( sig[n] ) >= out[n-1] ):
...             out[n] = out[n-1] * alpha + ( 1 - alpha ) * np.abs( sig[n] )
...         else:
...             out[n] = out[n-1] * beta
...     usec = aStopWATCH.stop()
...     print usec, " [usec]"

>>> anOriginalForLOOP()
164907  [usec]
165674  [usec]
165154  [usec]

Vectorizing python loop over numpy array

Low potential for vectorization by code "with feedback"

Performance increased by another 3x

More articles: