How to speed up enumeration for a numpy array / how to efficiently enumerate a numpy array?
I need to generate a lot of random numbers. I tried using random.random
but this function is rather slow. So I switched to numpy.random.random
which is faster! So far, so good. The generated random numbers are actually used to calculate some thing (based on a number). So I enumerate
over each number and replace the value. This seems to be killing all of my previously achieved accelerations. Here are the statistics generated with timeit()
:
test_random - no enumerate
0.133111953735
test_np_random - no enumerate
0.0177130699158
test_random - enumerate
0.269361019135
test_np_random - enumerate
1.22525310516
as you can see generating a number is almost 10x faster using numpy, but enumerating over those numbers gives me equal execution time.
Below is the code I am using:
import numpy as np
import timeit
import random
NBR_TIMES = 10
NBR_ELEMENTS = 100000
def test_random(do_enumerate=False):
y = [random.random() for i in range(NBR_ELEMENTS)]
if do_enumerate:
for index, item in enumerate(y):
# overwrite the y value, in reality this will be some function of 'item'
y[index] = 1 + item
def test_np_random(do_enumerate=False):
y = np.random.random(NBR_ELEMENTS)
if do_enumerate:
for index, item in enumerate(y):
# overwrite the y value, in reality this will be some function of 'item'
y[index] = 1 + item
if __name__ == '__main__':
from timeit import Timer
t = Timer("test_random()", "from __main__ import test_random")
print "test_random - no enumerate"
print t.timeit(NBR_TIMES)
t = Timer("test_np_random()", "from __main__ import test_np_random")
print "test_np_random - no enumerate"
print t.timeit(NBR_TIMES)
t = Timer("test_random(True)", "from __main__ import test_random")
print "test_random - enumerate"
print t.timeit(NBR_TIMES)
t = Timer("test_np_random(True)", "from __main__ import test_np_random")
print "test_np_random - enumerate"
print t.timeit(NBR_TIMES)
What's the best way to speed it up and why is it enumerate
slowing things down so dramatically?
EDIT: The reason I'm using enumerate
is because I need both the index and the value of the current item.
source to share
To take full advantage of the speed of numpy, you want to create ufuncs whenever possible . Applying vectorize
to the mgibsonbr function suggests one way of doing this, but the best way, if possible, is to simply build a function that uses numpy's built-in ufuncs. So something like this:
>>> import numpy
>>> a = numpy.random.random(10)
>>> a + 1
array([ 1.29738145, 1.33004628, 1.45825441, 1.46171177, 1.56863326,
1.58502855, 1.06693054, 1.93304272, 1.66056379, 1.91418473])
>>> (a + 1) * 0.25 / 4
array([ 0.08108634, 0.08312789, 0.0911409 , 0.09135699, 0.09803958,
0.09906428, 0.06668316, 0.12081517, 0.10378524, 0.11963655])
What is the nature of the function you want to apply to a numpy array? If you let us know, maybe we can help you come up with a version that only uses numpy ufuncs.
It is also possible to generate an array of indices without using enumerate
. Numpy provides ndenumerate
, which is an iterator and probably slower, but also provides indices
, which is a very fast way to generate indices that correspond to the values in an array. So that...
>>> numpy.indices(a.shape)
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
To be more explicit, you can use the above and combine them with numpy.rec.fromarrays
:
>>> a = numpy.random.random(10)
>>> ind = numpy.indices(a.shape)
>>> numpy.rec.fromarrays([ind[0], a])
rec.array([(0, 0.092473494150913438), (1, 0.20853257641948986),
(2, 0.35141455604686067), (3, 0.12212258656960817),
(4, 0.50986868372639049), (5, 0.0011439325711705139),
(6, 0.50412473457942508), (7, 0.28973489788728601),
(8, 0.20078799423168536), (9, 0.34527678271856999)],
dtype=[('f0', '<i8'), ('f1', '<f8')])
It starts to sound like your main problem is performing on-site surgery. It's harder to do it with vectorize
, but it's easy with ufunc's approach:
>>> def somefunc(a):
... a += 1
... a /= 15
...
>>> a = numpy.random.random(10)
>>> b = a
>>> somefunc(a)
>>> a
array([ 0.07158446, 0.07052393, 0.07276768, 0.09813235, 0.09429439,
0.08561703, 0.11204622, 0.10773558, 0.11878885, 0.10969279])
>>> b
array([ 0.07158446, 0.07052393, 0.07276768, 0.09813235, 0.09429439,
0.08561703, 0.11204622, 0.10773558, 0.11878885, 0.10969279])
As you can see, numpy does these operations in place.
source to share
Check out numpy.vectorize , it should allow you to apply arbitrary functions to numpy arrays. For your simple example, you would do something like this:
vecFunc = vectorize(lambda x: x + 1) vecFunc(y)
However, this will create a new numpy array instead of modifying it in place (which may or may not be a problem in your particular case).
In general, you will always be better off manipulating numpy structs with numpy functions than iterating with python functions, since the former are not only optimized but also implemented in C, and the latter are always interpreted.
source to share