Why does my array lose its mask after multidimensional indexing in Numpy?
I want to use a multidimensional MaskedArray as an array of indices:
Data:
In [149]: np.ma.arange(10, 60, 2)
Out[149]:
masked_array(data = [10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58],
mask = False,
fill_value = 999999)
Indices:
In [140]: np.ma.array(np.arange(20).reshape(4, 5),
mask=np.arange(20).reshape(4, 5) % 3)
Out[140]:
masked_array(data =
[[0 -- -- 3 --]
[-- 6 -- -- 9]
[-- -- 12 -- --]
[15 -- -- 18 --]],
mask =
[[False True True False True]
[ True False True True False]
[ True True False True True]
[False True True False True]],
fill_value = 999999)
Desired output:
In [151]: np.ma.arange(10, 60, 2)[np.ma.array(np.arange(20).reshape(4, 5), mask=np.arange(20).reshape(4, 5) % 3)]
Out[151]:
masked_array(data =
[[10 -- -- 16 --]
[-- 22 -- -- 28]
[-- -- 34 -- --]
[40 -- -- 46 --]],
mask =
False,
fill_value = 999999)
Actual output:
In [160]: np.ma.arange(10, 60, 2)[np.ma.array(np.arange(20).reshape(4, 5), mask=np.arange(20).reshape(4, 5) % 3)]
Out[160]:
masked_array(data =
[[10 12 14 16 18]
[20 22 24 26 28]
[30 32 34 36 38]
[40 42 44 46 48]],
mask =
False,
fill_value = 999999)
Why would the resulting array lose its mask? As per the answer here: Indexing with Masked Arrays in numpy , this indexing method is very bad. Why?
source to share
It looks like indexing with a masked array is just ignoring the mask. Without digging through the docs or code, I would say that array indexing numpy
has no special knowledge of subclassing a masked array. The array you get is normal indexing arange(20)
.
But you can do normal indexing and "copy" the mask:
In [13]: data=np.arange(10,60,2)
In [14]: mI = np.ma.array(np.arange(20).reshape(4,5),mask=np.arange(20).reshape(4,5) % 3)
...
In [16]: np.ma.array(data[mI], mask=mI.mask)
Out[16]:
masked_array(data =
[[10 -- -- 16 --]
[-- 22 -- -- 28]
[-- -- 34 -- --]
[40 -- -- 46 --]],
mask =
[[False True True False True]
[ True False True True False]
[ True True False True True]
[False True True False True]],
fill_value = 999999)
You really need to combine indexing and masking into one operation (and array masking). This operation will work just as well if the mask is separate.
I = np.arange(20).reshape(4,5)
m = (np.arange(20).reshape(4,5) % 3)>0
np.ma.array(data[I], mask=m)
If the masked index entries are not valid (e.g. out of range), you can fill them with something valid (followed by masking if necessary):
data[mI.filled(fill_value=0)]
Have you seen in the numpy array masked docs an example of using a masked array to index another? Or all the data of the masked arrays? Perhaps the designers never intended to use masked indexes.
The masked array .choose
works because it uses a method that was subclassed for masked arrays. Routine indexing is likely to transform the index in a regular array with something like: data[np.asarray(mI)]
.
The method __getitem__
for the class is MaskedArray
run:
def __getitem__(self, indx):
Return the item described by i, as a masked array.
"""
# This test is useful, but we should keep things light...
# if getmask(indx) is not nomask:
# msg = "Masked arrays must be filled before they can be used as indices!"
# raise IndexError(msg)
This is the method that is called at run time []
on a masked array. Obviously, the developer (s) thought he was formally blocking the use of a masked index, but decided that this was not an important enough issue. See the file for details np.ma.core.py
.
source to share
Try using the choose method like this:
np.ma.array(np.arange(20).reshape(4, 5), mask=np.arange(20).reshape(4, 5) % 3).
choose(np.ma.arange(10, 60, 2))
which gives:
masked_array(data =
[[10 -- -- 16 --]
[-- 22 -- -- 28]
[-- -- 34 -- --]
[40 -- -- 46 --]],
mask =
[[False True True False True]
[ True False True True False]
[ True True False True True]
[False True True False True]],
fill_value = 999999)
source to share