# How to speed up enumeration for a numpy array / how to efficiently enumerate a numpy array?

I need to generate a lot of random numbers. I tried using `random.random`

but this function is rather slow. So I switched to `numpy.random.random`

which is faster! So far, so good. The generated random numbers are actually used to calculate some thing (based on a number). So I `enumerate`

over each number and replace the value. This seems to be killing all of my previously achieved accelerations. Here are the statistics generated with `timeit()`

:

``````test_random - no enumerate
0.133111953735
test_np_random - no enumerate
0.0177130699158

test_random - enumerate
0.269361019135
test_np_random - enumerate
1.22525310516
```

```

as you can see generating a number is almost 10x faster using numpy, but enumerating over those numbers gives me equal execution time.

Below is the code I am using:

``````import numpy as np
import timeit
import random

NBR_TIMES = 10
NBR_ELEMENTS = 100000

def test_random(do_enumerate=False):
y = [random.random() for i in range(NBR_ELEMENTS)]
if do_enumerate:
for index, item in enumerate(y):
# overwrite the y value, in reality this will be some function of 'item'
y[index] = 1 + item

def test_np_random(do_enumerate=False):
y = np.random.random(NBR_ELEMENTS)
if do_enumerate:
for index, item in enumerate(y):
# overwrite the y value, in reality this will be some function of 'item'
y[index] = 1 + item

if __name__ == '__main__':
from timeit import Timer

t = Timer("test_random()", "from __main__ import test_random")
print "test_random - no enumerate"
print t.timeit(NBR_TIMES)

t = Timer("test_np_random()", "from __main__ import test_np_random")
print "test_np_random - no enumerate"
print t.timeit(NBR_TIMES)

t = Timer("test_random(True)", "from __main__ import test_random")
print "test_random - enumerate"
print t.timeit(NBR_TIMES)

t = Timer("test_np_random(True)", "from __main__ import test_np_random")
print "test_np_random - enumerate"
print t.timeit(NBR_TIMES)
```

```

What's the best way to speed it up and why is it `enumerate`

slowing things down so dramatically?

EDIT: The reason I'm using `enumerate`

is because I need both the index and the value of the current item.

+3

source to share

To take full advantage of the speed of numpy, you want to create ufuncs whenever possible . Applying `vectorize`

to the mgibsonbr function suggests one way of doing this, but the best way, if possible, is to simply build a function that uses numpy's built-in ufuncs. So something like this:

``````>>> import numpy
>>> a = numpy.random.random(10)
>>> a + 1
array([ 1.29738145,  1.33004628,  1.45825441,  1.46171177,  1.56863326,
1.58502855,  1.06693054,  1.93304272,  1.66056379,  1.91418473])
>>> (a + 1) * 0.25 / 4
array([ 0.08108634,  0.08312789,  0.0911409 ,  0.09135699,  0.09803958,
0.09906428,  0.06668316,  0.12081517,  0.10378524,  0.11963655])
```

```

What is the nature of the function you want to apply to a numpy array? If you let us know, maybe we can help you come up with a version that only uses numpy ufuncs.

It is also possible to generate an array of indices without using `enumerate`

. Numpy provides `ndenumerate`

, which is an iterator and probably slower, but also provides `indices`

, which is a very fast way to generate indices that correspond to the values ​​in an array. So that...

``````>>> numpy.indices(a.shape)
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
```

```

To be more explicit, you can use the above and combine them with `numpy.rec.fromarrays`

:

``````>>> a = numpy.random.random(10)
>>> ind = numpy.indices(a.shape)
>>> numpy.rec.fromarrays([ind, a])
rec.array([(0, 0.092473494150913438), (1, 0.20853257641948986),
(2, 0.35141455604686067), (3, 0.12212258656960817),
(4, 0.50986868372639049), (5, 0.0011439325711705139),
(6, 0.50412473457942508), (7, 0.28973489788728601),
(8, 0.20078799423168536), (9, 0.34527678271856999)],
dtype=[('f0', '<i8'), ('f1', '<f8')])
```

```

It starts to sound like your main problem is performing on-site surgery. It's harder to do it with `vectorize`

, but it's easy with ufunc's approach:

``````>>> def somefunc(a):
...     a += 1
...     a /= 15
...
>>> a = numpy.random.random(10)
>>> b = a
>>> somefunc(a)
>>> a
array([ 0.07158446,  0.07052393,  0.07276768,  0.09813235,  0.09429439,
0.08561703,  0.11204622,  0.10773558,  0.11878885,  0.10969279])
>>> b
array([ 0.07158446,  0.07052393,  0.07276768,  0.09813235,  0.09429439,
0.08561703,  0.11204622,  0.10773558,  0.11878885,  0.10969279])
```

```

As you can see, numpy does these operations in place.

+6

source

Check out numpy.vectorize , it should allow you to apply arbitrary functions to numpy arrays. For your simple example, you would do something like this:

``````vecFunc = vectorize(lambda x: x + 1)
vecFunc(y)
```

```

However, this will create a new numpy array instead of modifying it in place (which may or may not be a problem in your particular case).

In general, you will always be better off manipulating numpy structs with numpy functions than iterating with python functions, since the former are not only optimized but also implemented in C, and the latter are always interpreted.

+3

source

All Articles