Numpy select non null lines

Question

Numpy select non null lines

I only want to select rows that don't have any 0 element.

data = np.array([[1,2,3,4,5],
                [6,7,0,9,10],
                [11,12,13,14,15],
                [16,17,18,19,0]])

Result:

array([[1,2,3,4,5],
       [11,12,13,14,15]])

+3

python numpy

hiker 28 june 15 at 7:03

source to share

2 answers

You can detect all zeros with data ==0

which will give you a boolean array and then execute np.any

along each line on it. Alternatively, you can detect all non-zeros with data!=0

and then do np.all

to get a string of strings without zero.

Can also be used np.einsum

to replace np.any

, which I personally think is crazy, but in a good way, as it gives us a noticeable performance boost, as we'll confirm later in this solution.

Thus, you will have three approaches listed below.

Approach # 1:

rows_without_zeros = data[~np.any(data==0, axis=1)]

Approach # 2:

rows_without_zeros = data[np.all(data!=0, axis=1)]

Approach # 3:

rows_without_zeros = data[~np.einsum('ij->i',data ==0)]

Runtime tests -

This section explores the three solutions offered in this solution and also includes timings for @ Ashwini Chaudhary's approach , which is also based on np.all

but does not use a mask or boolean array (at least in the frontend).

In [129]: data = np.random.randint(-10,10,(10000,10))

In [130]: %timeit data[np.all(data, axis=1)]
1000 loops, best of 3: 1.09 ms per loop

In [131]: %timeit data[np.all(data!=0, axis=1)]
1000 loops, best of 3: 1.03 ms per loop

In [132]: %timeit data[~np.any(data==0,1)]
1000 loops, best of 3: 1 ms per loop

In [133]: %timeit data[~np.einsum('ij->i',data ==0)]
1000 loops, best of 3: 825 µs per loop

So it seems that supplying masks at np.all

or np.any

gives a bit (about ) performance improvement over the asymmetric approach. With help you are looking at improvement over approaches and , which is not bad! 9%

einsum

20%

np.any

np.all

+2

Divakar 28 june 15 at 9:07 am

source to share

Ashwini chaudhary · Accepted Answer · 2015-06-28T07:04:01+0000

Use numpy.all

:

>>> data[np.all(data, axis=1)]
array([[ 1,  2,  3,  4,  5],
       [11, 12, 13, 14, 15]])

Numpy select non null lines

More articles: