Numpy select non null lines
You can detect all zeros with data ==0
which will give you a boolean array and then execute np.any
along each line on it. Alternatively, you can detect all non-zeros with data!=0
and then do np.all
to get a string of strings without zero.
Can also be used np.einsum
to replace np.any
, which I personally think is crazy, but in a good way, as it gives us a noticeable performance boost, as we'll confirm later in this solution.
Thus, you will have three approaches listed below.
Approach # 1:
rows_without_zeros = data[~np.any(data==0, axis=1)]
Approach # 2:
rows_without_zeros = data[np.all(data!=0, axis=1)]
Approach # 3:
rows_without_zeros = data[~np.einsum('ij->i',data ==0)]
Runtime tests -
This section explores the three solutions offered in this solution and also includes timings for @ Ashwini Chaudhary's approach , which is also based on np.all
but does not use a mask or boolean array (at least in the frontend).
In [129]: data = np.random.randint(-10,10,(10000,10))
In [130]: %timeit data[np.all(data, axis=1)]
1000 loops, best of 3: 1.09 ms per loop
In [131]: %timeit data[np.all(data!=0, axis=1)]
1000 loops, best of 3: 1.03 ms per loop
In [132]: %timeit data[~np.any(data==0,1)]
1000 loops, best of 3: 1 ms per loop
In [133]: %timeit data[~np.einsum('ij->i',data ==0)]
1000 loops, best of 3: 825 µs per loop
So it seems that supplying masks at np.all
or np.any
gives a bit (about ) performance improvement over the asymmetric approach. With help you are looking at improvement over approaches and , which is not bad! 9%
einsum
20%
np.any
np.all
source to share