How to speed up enumeration for a numpy array / how to efficiently enumerate a numpy array?

I need to generate a lot of random numbers. I tried using random.random

but this function is rather slow. So I switched to numpy.random.random

which is faster! So far, so good. The generated random numbers are actually used to calculate some thing (based on a number). So I enumerate

over each number and replace the value. This seems to be killing all of my previously achieved accelerations. Here are the statistics generated with timeit()

:

test_random - no enumerate
0.133111953735
test_np_random - no enumerate
0.0177130699158


test_random - enumerate
0.269361019135
test_np_random - enumerate
1.22525310516

      

as you can see generating a number is almost 10x faster using numpy, but enumerating over those numbers gives me equal execution time.

Below is the code I am using:

import numpy as np
import timeit
import random

NBR_TIMES = 10
NBR_ELEMENTS = 100000

def test_random(do_enumerate=False):
    y = [random.random() for i in range(NBR_ELEMENTS)]
    if do_enumerate:
        for index, item in enumerate(y):
            # overwrite the y value, in reality this will be some function of 'item'
            y[index] = 1 + item

def test_np_random(do_enumerate=False):
    y = np.random.random(NBR_ELEMENTS)
    if do_enumerate:
        for index, item in enumerate(y):
            # overwrite the y value, in reality this will be some function of 'item'
            y[index] = 1 + item

if __name__ == '__main__':
    from timeit import Timer

    t = Timer("test_random()", "from __main__ import test_random")
    print "test_random - no enumerate"
    print t.timeit(NBR_TIMES)

    t = Timer("test_np_random()", "from __main__ import test_np_random")
    print "test_np_random - no enumerate"
    print t.timeit(NBR_TIMES)


    t = Timer("test_random(True)", "from __main__ import test_random")
    print "test_random - enumerate"
    print t.timeit(NBR_TIMES)

    t = Timer("test_np_random(True)", "from __main__ import test_np_random")
    print "test_np_random - enumerate"
    print t.timeit(NBR_TIMES)

      

What's the best way to speed it up and why is it enumerate

slowing things down so dramatically?

EDIT: The reason I'm using enumerate

is because I need both the index and the value of the current item.

+3


source to share


2 answers


To take full advantage of the speed of numpy, you want to create ufuncs whenever possible . Applying vectorize

to the mgibsonbr function suggests one way of doing this, but the best way, if possible, is to simply build a function that uses numpy's built-in ufuncs. So something like this:

>>> import numpy
>>> a = numpy.random.random(10)
>>> a + 1
array([ 1.29738145,  1.33004628,  1.45825441,  1.46171177,  1.56863326,
        1.58502855,  1.06693054,  1.93304272,  1.66056379,  1.91418473])
>>> (a + 1) * 0.25 / 4
array([ 0.08108634,  0.08312789,  0.0911409 ,  0.09135699,  0.09803958,
        0.09906428,  0.06668316,  0.12081517,  0.10378524,  0.11963655])

      

What is the nature of the function you want to apply to a numpy array? If you let us know, maybe we can help you come up with a version that only uses numpy ufuncs.

It is also possible to generate an array of indices without using enumerate

. Numpy provides ndenumerate

, which is an iterator and probably slower, but also provides indices

, which is a very fast way to generate indices that correspond to the values ​​in an array. So that...

>>> numpy.indices(a.shape)
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

      



To be more explicit, you can use the above and combine them with numpy.rec.fromarrays

:

>>> a = numpy.random.random(10)
>>> ind = numpy.indices(a.shape)
>>> numpy.rec.fromarrays([ind[0], a])
rec.array([(0, 0.092473494150913438), (1, 0.20853257641948986),
       (2, 0.35141455604686067), (3, 0.12212258656960817),
       (4, 0.50986868372639049), (5, 0.0011439325711705139),
       (6, 0.50412473457942508), (7, 0.28973489788728601),
       (8, 0.20078799423168536), (9, 0.34527678271856999)], 
      dtype=[('f0', '<i8'), ('f1', '<f8')])

      

It starts to sound like your main problem is performing on-site surgery. It's harder to do it with vectorize

, but it's easy with ufunc's approach:

>>> def somefunc(a):
...     a += 1
...     a /= 15
... 
>>> a = numpy.random.random(10)
>>> b = a
>>> somefunc(a)
>>> a
array([ 0.07158446,  0.07052393,  0.07276768,  0.09813235,  0.09429439,
        0.08561703,  0.11204622,  0.10773558,  0.11878885,  0.10969279])
>>> b
array([ 0.07158446,  0.07052393,  0.07276768,  0.09813235,  0.09429439,
        0.08561703,  0.11204622,  0.10773558,  0.11878885,  0.10969279])

      

As you can see, numpy does these operations in place.

+6


source


Check out numpy.vectorize , it should allow you to apply arbitrary functions to numpy arrays. For your simple example, you would do something like this:

vecFunc = vectorize(lambda x: x + 1)
vecFunc(y)

      



However, this will create a new numpy array instead of modifying it in place (which may or may not be a problem in your particular case).

In general, you will always be better off manipulating numpy structs with numpy functions than iterating with python functions, since the former are not only optimized but also implemented in C, and the latter are always interpreted.

+3


source







All Articles