Why does the assignment of a boolean indexed structured array depend on the ordering of the index?
I recently saw a phenomenon in working with numpy structured arrays that doesn't make sense. I hope someone can help me understand what is going on. I have provided a minimal working example to illustrate the problem. The problem is this:
When indexing a numpy structured array with boolean mask, this works:
arr['fieldName'][boolMask] += val
but the following:
arr[boolMask]['fieldName'] += val
Here's a minimal working example:
import numpy as np
myDtype = np.dtype([('t','<f8'),('p','<f8',(3,)),('v','<f4',(3,))])
nominalArray = np.zeros((10,),dtype=myDtype)
nominalArray['t'] = np.arange(10.)
# In real life, the other fields would also be populated
print "original times: {0}".format(nominalArray['t'])
# Add 10 to all times greater than 5
timeGreaterThan5 = nominalArray['t'] > 5
nominalArray['t'][timeGreaterThan5] += 10.
print "times after first operation: {0}".format(nominalArray['t'])
# Return those times to their original values
nominalArray[timeGreaterThan5]['t'] -= 10.
print "times after second operation: {0}".format(nominalArray['t'])
Doing this produces the following output:
original times: [ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
times after first operation: [ 0. 1. 2. 3. 4. 5. 16. 17. 18. 19.]
times after second operation: [ 0. 1. 2. 3. 4. 5. 16. 17. 18. 19.]
We can clearly see here that the second operation had no effect. If anyone can explain why this is happening, we would be very grateful.
source to share
This is really a copy v view problem. But I'll tell you more.
The key difference between the v representation in the copy is the indexing pattern, regular or not. Regular is expressed through an array shape
, strides
and dtype
. In general, a logical index (and a linked list of indexes) cannot be expressed in these terms, so it numpy
must return a copy.
I like to watch the property arr.__array_interface__
. It shows the shape, steps and a pointer to the data buffer. If the pointer is the same as with the original, this is view
.
With the arr[idx] += 1
Indexing - a technique in fact setitem
, which selects which of the data buffer elements will be modified with the addition. The distinction between view and copy does not apply.
But from the arr[idx1][idx2] += 1
first indexing is the method getitem
. For this, the distinction between glance and copy is important. The second indexing changes the array created by the 1st. If it is a view, the change affects the original data; if a copy, nothing happens. The copy can be modified, but it disappears across the garbage collection screen.
With 2d array, you can combine these two indexing steps arr[idx1, idx2] += 1
; and this is actually the preferred syntax.
With structured arrays, field indexing is similar to column indexing, but not exactly the same. First, it cannot be combined with item indexing.
Simple structured array:
In [234]: arr=np.ones((5,),dtype='i,f,i,f')
In [235]: arr.__array_interface__
{'strides': None,
'shape': (5,),
'data': (152524816, False),
'descr': [('f0', '<i4'), ('f1', '<f4'), ('f2', '<i4'), ('f3', '<f4')],
'typestr': '|V16',
'version': 3}
Selecting one field creates a view - the same data pointer
In [236]: arr['f0'].__array_interface__['data']
Out[236]: (152524816, False)
Selecting items with boolean creates a diff pointer
In [242]: idx = np.array([1,0,0,1,1],bool)
In [243]: arr[idx].__array_interface__['data']
Out[243]: (152629520, False)
So, it arr['f0'][idx] += 1
changes the selected items in the box f0
.
arr[idx]['f0'] += 1
changes the field of the f0
copy without affecting arr
.
arr[idx]['f0'] + 1
and arr['f0'][idx] + 1
display the same thing, but they don't try to do any in-place changes.
You can select multiple fields of a structured array arr[['f0','f2']]
. But this is a copy. (and I get a warning suggesting to make an explicit copy).
source to share