How do I compare two numpy arrays of strings with the "in" operator to get a boolean array using a broadcast array?

Python allows a simple check if a string is contained in another string:

'ab' in 'abcd'


which is evaluated as True


Now take an array of strings numpy

and you can do this:

import numpy as np
A0 = np.array(['z', 'u', 'w'],dtype=object)

A0[:,None] != A0


The result in a boolean array:

array([[False,  True,  True],
       [ True, False,  True],
       [ True,  True, False]], dtype=bool)


Now let's take another array:

A1 = np.array(['u_w', 'u_z', 'w_z'],dtype=object)


I want to check where a string is A0

not contained in a string in A1

, essentially creating unique combinations, but the following does not yield a boolean array, only one boolean value, no matter how I write the indices:

A0[:,None] not in A1


I also tried using numpy.in1d

and np.ndarray.__contains__

, but those methods don't do the trick either.

Performance is an issue, so I want to take full advantage of the optimization numpy's


How can I achieve this?


I found it can be done like this:

fv = np.vectorize(lambda x,y: x not in y)


But as stated in the numpy


The vectorization feature is provided primarily for convenience, not performance. The implementation is essentially a for loop.

So this is the same as just looping through the array, and it would be nice to solve this without an explicit or implicit for loop.


source to share

1 answer

We can convert to string

dtype and then use one of those NumPy based string functions .

So using np.char.count

, one solution would be -



Alternative option np.char.find




Another use np.char.rfind




If we convert one to a str

dtype, we can skip converting for another array, since internally it will be done anyway. So the last method can be simplified to -



Example run -

In [97]: A0
Out[97]: array(['z', 'u', 'w'], dtype=object)

In [98]: A1
Out[98]: array(['u_w', 'u_z', 'w_z', 'zz'], dtype=object)

In [99]: np.char.rfind(A1.astype(str),A0[:,None])==-1
array([[ True, False, False, False],
       [False, False,  True,  True],
       [False,  True, False,  True]], dtype=bool)

# Loopy solution using np.vectorize for verification
In [100]: fv = np.vectorize(lambda x,y: x not in y)

In [102]: fv(A0[:,None],A1)
array([[ True, False, False, False],
       [False, False,  True,  True],
       [False,  True, False,  True]], dtype=bool)




All Articles