I am using "np.count_nonzero (myarray)> smallvalue" on a bunch of numpy arrays. Can I stop counting halfway through once when "low value" is reached?

Question

I am using "np.count_nonzero (myarray)> smallvalue" on a bunch of numpy arrays. Can I stop counting halfway through once when "low value" is reached?

The arrays I check are boolean. In this case, np.count_nonzero () seems to be the most efficient way of doing the "sum" . I'm still wondering if there is a way to do this faster, perhaps by doing a greater-than check while counting!

Here is a toy example in which I find my approach (I assume I am using "timeit" and on average more than 100 tests is pretty stupid, but whatever) using a large array, not many small ones, and then the same on a smaller array to demonstrate how much faster it should be:

from timeit import time
import numpy as np
hugeflatarray=np.ones(100000000, dtype=bool)
smallflatarray=np.ones(10, dtype=bool)
smallvalue=1

mytimes=[]
for i in range(100):
    t1=time.clock()
    np.count_nonzero(hugeflatarray)>smallvalue
    t2=time.clock()
    mytimes.append(t2-t1)
print("average time for huge array:"+str(np.mean(mytimes)))

mytimes=[]
for i in range(100):
    t1=time.clock()
    np.count_nonzero(smallflatarray)>smallvalue
    t2=time.clock()
    mytimes.append(t2-t1)
print("average time for small array:"+str(np.mean(mytimes)))

average time for huge array: 0.0111809413765

average time for small array: 9.83558325865e-07

np.count_nonzero () probably works by looping through the entire array and accumulating values as it goes, right? Wouldn't it be faster if there was a way to stop once the "small value" was reached? "Short circuit".

edit:

@ user2357112 After reading your advice, I tried numba's solution and it looks a little faster than count_nonzero (hugearray)> smallvalue! Thank. Here's my solution: @numba.jit(numba.boolean(numba.boolean[:],numba.int64)) def jitcountgreaterthan(hugearray,smallvalue): a=numba.int64(0) for i in hugearray: a+=i if a==smallvalue: break return a==smallvalue

I made this weird "break, THEN return" because numba doesn't seem to support return statements in a for loop, but in practice it doesn't have any effect.

+3

performance python arrays numpy

440hertz 27 Mar 17 at 17:36

source to share