Numpy search array for multiple values ​​and return their indices

How can I find a small set of values ​​in a numpy array (not sorted and shouldn't be modified)? It should return the indices of these values.

For example:

a = np.array(['d', 'v', 'h', 'r', 'm', 'a'])   # in general it will be large
query = np.array(['a', 'v', 'd'])

# Required:
idnx = someNumpyFunction(a, query)

print(indx)       # should be [5, 1, 0]

      

I'm new to numpy and I couldn't find the correct way to do this task for multiple values ​​at the same time (I know np.where (a == 'd') can do this for a single value lookup).

+3


source to share


2 answers


The classic way to test one array against another is to adjust the shape and use "==":

In [250]: arr==query[:,None]
Out[250]: 
array([[False, False, False, False, False,  True],
       [False,  True, False, False, False, False],
       [ True, False, False, False, False, False]], dtype=bool)

In [251]: np.where(arr==query[:,None])
Out[251]: (array([0, 1, 2]), array([5, 1, 0]))

      

If the item is query

not found in a

, its "string" will be missing, eg. [0,2]

instead[0,1,2]

In [261]: np.where(arr==np.array(['a','x','v'],dtype='S')[:,None])
Out[261]: (array([0, 2]), array([5, 1]))   

      

For this small example, this is significantly faster than the list view equivalent:

np.hstack([(arr==i).nonzero()[0] for i in query])

      



This is a little slower than the solution searchsorted

. (This solution i

goes out of bounds if item is query

not found.)


Stefano suggested fromiter

. This saves time compared to the hstack

list:

In [313]: timeit np.hstack([(arr==i).nonzero()[0] for i in query])10000 loops, best of 3: 49.5 us per loop

In [314]: timeit np.fromiter(((arr==i).nonzero()[0] for i in query), dtype=int, count=len(query))
10000 loops, best of 3: 35.3 us per loop

      

But if an error occurs, then the item is missing, or there are multiple cases. hstack

can handle records of variable length, fromiter

cannot.

np.flatnonzero(arr==i)

slower than ().nonzero()[0]

that, but I didn't think about why.

+3


source


You can use np.searchsorted

on a sorted array and then revert the returned indices back to the original array. For this you can use np.argsort

; how in:

>>> indx = a.argsort()  # indices that would sort the array
>>> i = np.searchsorted(a[indx], query)  # indices in the sorted array
>>> indx[i]  # indices with respect to the original array
array([5, 1, 0])

      



if a

is n

and query

is sized k

, it will O(n log n + k log n)

be faster than O(n k)

linear search if log n < k

.

+1


source







All Articles