Missing items on overflow with np.all and dropping remote indexes
I have a dataset with size (400.40). Some columns are completely zero. They are not needed for calculations (I need to ignore them), but they are needed to overwrite the file.
So I am using numpy to import as an array, do initialization. But the problem comes up when I try to invert the matrix (again, necessary for calculations). As far as I know, if a matrix has a full zero column, it cannot be inverted (det (M) = 0).
So I use this to get non-null columns:
nonZero = dataSet[:, np.all(dataSet != 0, axis=0)]
(I also tried to sum the column with np.sum inside np.all) but it is missing some columns for no reason.
For example, my first line has:
[ 0, -1, -2, -3, 181, 5451, 0, 0, 8, 8, 1, 9, 9, 1, 0.11, 0, 0 ] etc.
When I run the above code, I get:
[ -1. -2. -3. 181. 8. 8. 1. 9. 9. 1. ]
5451 and 0.11 disappear even though the entire column is not 0, or they are 0.
I also need to get the deleted column indices, since I need to rewrite them after calculations ...
I'm not the best Python coder, but I can't seem to solve the problem or understand why this is happening. I recently learned how to use numpy and I am quite getting started at it. This has bothered me for 2 days already. Any advice / help is appreciated.
source to share
np.all
is similar to and
, it checks each value if its zero. You want to use np.any
for or
both behavior, i.e. If you want to neglect the 0 present in the non-null column example
dataSet = np.array([[ 0, -1, -2, -3, 181, 5451, 0, 0, 8, 8, 1, 9, 9, 1, 0.11, 0, 0 ], [ 0, -1, -2, -3, 181, 5451, 0, 0, 8, 8, 1, 9, 9, 1, 0, 0, 0 ], [ 0, -1, -2, -3, 181, 0, 0, 0, 8, 8, 1, 9, 9, 1, 0.11, 0, 0 ]]) nonZero = dataSet[:, np.any(dataSet, axis=0)] nonZero
array ([[-1.00000000e + 00, -2.00000000e + 00, -3.00000000e + 00, 1.81000000e + 02, 5.45100000e + 03, 8.00000000e + 00, 8.00000000e + 00, 1.00000000e + 00, 9.00000000e + 00, 9.00000000e + 00, 1.00000000e + 00, 1.10000000e-01], [-1.00000000e + 00, -2.00000000e + 00, -3.00000000e + 00, 1.81000000e + 02, 5.45100000e + 03, 8.00000000e + 00, 8.00000000e + 00, 1.00000000e + 00, 9.00000000e + 00, 9.00000000e + 00, 1.00000000e + 00, 0.00000000e + 00], [-1.00000000e + 00, -2.00000000e + 00, -3.00000000e + 00, 1.81000000e + 02, 0.00000000e + 00, 8.00000000e + 00, 8.00000000e + 00, 1.00000000e + 00, 9.00000000e + 00, 9.00000000e + 00, 1.00000000e + 00, 1.10000000e-01]])
If you want to extract columns you can use np.where
ie
np.where(~dataSet.any(axis=0))
Output:
(array ([0, 6, 7, 15, 16]),)
source to share
There is a mistake in your logic. You don't want to drop columns where all values are nonzero . Given the explanation, you want to drop the columns, all zero:
For example:
arr = np.array([[1, 1, 0, 1, 0, 0, 1, 0, 0, 1],
[1, 0, 0, 1, 1, 1, 1, 0, 0, 1],
[0, 1, 0, 1, 0, 1, 0, 0, 0, 0],
[1, 0, 0, 1, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 1, 0]])
arr[:, ~np.all(arr == 0, axis=0)]
# array([[1, 1, 1, 0, 0, 1, 0, 1],
# [1, 0, 1, 1, 1, 1, 0, 1],
# [0, 1, 1, 0, 1, 0, 0, 0],
# [1, 0, 1, 0, 0, 0, 1, 0],
# [0, 0, 1, 0, 0, 0, 1, 0]])
But you can also use np.any
instead np.all
:
arr[:, np.any(arr != 0, axis=0)]
source to share
It is always better to work with smaller examples.
For example:
arr = np.array([
[0, 0, 1],
[0, 1, 1]
])
nonzero = arr != 0
print(nonzero)
# prints
# [[False False True]
# [False True True]]
all_nonzero = np.all(nonzero, axis=0)
print(all_nonzero)
# prints
# [False False True]
Now you see the problem. Your logic creates a column mask that only selects columns for which all items in the column are nonzero. What you really want is columns where not all elements are zero, or in another way: where any element in the column is nonzero.
any_nonzero = np.any(nonzero, axis=0)
print(any_nonzero)
# prints
# [False True True]
source to share