Missing items on overflow with np.all and dropping remote indexes

Question

Missing items on overflow with np.all and dropping remote indexes

I have a dataset with size (400.40). Some columns are completely zero. They are not needed for calculations (I need to ignore them), but they are needed to overwrite the file.

So I am using numpy to import as an array, do initialization. But the problem comes up when I try to invert the matrix (again, necessary for calculations). As far as I know, if a matrix has a full zero column, it cannot be inverted (det (M) = 0).

So I use this to get non-null columns:

nonZero = dataSet[:, np.all(dataSet != 0, axis=0)]

(I also tried to sum the column with np.sum inside np.all) but it is missing some columns for no reason.

For example, my first line has:

[ 0, -1, -2, -3, 181, 5451, 0, 0, 8, 8, 1, 9, 9, 1, 0.11, 0, 0 ] etc.

When I run the above code, I get:

[ -1.  -2.  -3.  181.  8.  8.  1.  9.  9.  1.  ]

5451 and 0.11 disappear even though the entire column is not 0, or they are 0.

I also need to get the deleted column indices, since I need to rewrite them after calculations ...

I'm not the best Python coder, but I can't seem to solve the problem or understand why this is happening. I recently learned how to use numpy and I am quite getting started at it. This has bothered me for 2 days already. Any advice / help is appreciated.

+3

python windows python-3.x numpy

Tarık Bir Jul 19 17 at 15:32

source to share

3 answers

There is a mistake in your logic. You don't want to drop columns where all values are nonzero . Given the explanation, you want to drop the columns, all zero:

For example:

arr = np.array([[1, 1, 0, 1, 0, 0, 1, 0, 0, 1],
                [1, 0, 0, 1, 1, 1, 1, 0, 0, 1],
                [0, 1, 0, 1, 0, 1, 0, 0, 0, 0],
                [1, 0, 0, 1, 0, 0, 0, 0, 1, 0],
                [0, 0, 0, 1, 0, 0, 0, 0, 1, 0]])
arr[:, ~np.all(arr == 0, axis=0)]
# array([[1, 1, 1, 0, 0, 1, 0, 1],
#        [1, 0, 1, 1, 1, 1, 0, 1],
#        [0, 1, 1, 0, 1, 0, 0, 0],
#        [1, 0, 1, 0, 0, 0, 1, 0],
#        [0, 0, 1, 0, 0, 0, 1, 0]])

But you can also use np.any

instead np.all

:

arr[:, np.any(arr != 0, axis=0)]

+2

MSeifert Jul 19 17 at 15:53

source to share

It is always better to work with smaller examples.

For example:

arr = np.array([
    [0, 0, 1],
    [0, 1, 1]
])

nonzero = arr != 0
print(nonzero)
# prints
# [[False False  True]
#  [False  True  True]]

all_nonzero = np.all(nonzero, axis=0)
print(all_nonzero)
# prints
# [False False  True]

Now you see the problem. Your logic creates a column mask that only selects columns for which all items in the column are nonzero. What you really want is columns where not all elements are zero, or in another way: where any element in the column is nonzero.

any_nonzero = np.any(nonzero, axis=0)
print(any_nonzero)
# prints
# [False  True  True]

+1

Dunes Jul 19 17 at 15:53

source to share

Dark · Accepted Answer · 2017-07-19T15:55:40+0000

np.all

is similar to and

, it checks each value if its zero. You want to use np.any

for or

both behavior, i.e. If you want to neglect the 0 present in the non-null column example

dataSet = np.array([[ 0, -1, -2, -3, 181, 5451, 0, 0, 8, 8, 1, 9, 9, 1, 0.11, 0, 0 ],
                [ 0, -1, -2, -3, 181, 5451, 0, 0, 8, 8, 1, 9, 9, 1, 0, 0, 0 ],
                [ 0, -1, -2, -3, 181, 0,    0, 0, 8, 8, 1, 9, 9, 1, 0.11, 0, 0 ]])
nonZero = dataSet[:, np.any(dataSet, axis=0)]
nonZero

array ([[-1.00000000e + 00, -2.00000000e + 00, -3.00000000e + 00,
          1.81000000e + 02, 5.45100000e + 03, 8.00000000e + 00,
          8.00000000e + 00, 1.00000000e + 00, 9.00000000e + 00,
          9.00000000e + 00, 1.00000000e + 00, 1.10000000e-01],
       [-1.00000000e + 00, -2.00000000e + 00, -3.00000000e + 00,
          1.81000000e + 02, 5.45100000e + 03, 8.00000000e + 00,
          8.00000000e + 00, 1.00000000e + 00, 9.00000000e + 00,
          9.00000000e + 00, 1.00000000e + 00, 0.00000000e + 00],
       [-1.00000000e + 00, -2.00000000e + 00, -3.00000000e + 00,
          1.81000000e + 02, 0.00000000e + 00, 8.00000000e + 00,
          8.00000000e + 00, 1.00000000e + 00, 9.00000000e + 00,
          9.00000000e + 00, 1.00000000e + 00, 1.10000000e-01]])

If you want to extract columns you can use np.where

ie

np.where(~dataSet.any(axis=0))

Output:

(array ([0, 6, 7, 15, 16]),)

Missing items on overflow with np.all and dropping remote indexes

More articles: