Why does the assignment of a boolean indexed structured array depend on the ordering of the index?

I recently saw a phenomenon in working with numpy structured arrays that doesn't make sense. I hope someone can help me understand what is going on. I have provided a minimal working example to illustrate the problem. The problem is this:

When indexing a numpy structured array with boolean mask, this works:

arr['fieldName'][boolMask] += val

      

but the following:

arr[boolMask]['fieldName'] += val

      

Here's a minimal working example:

import numpy as np

myDtype = np.dtype([('t','<f8'),('p','<f8',(3,)),('v','<f4',(3,))])

nominalArray = np.zeros((10,),dtype=myDtype)
nominalArray['t'] = np.arange(10.)
# In real life, the other fields would also be populated
print "original times: {0}".format(nominalArray['t'])

# Add 10 to all times greater than 5
timeGreaterThan5 = nominalArray['t'] > 5
nominalArray['t'][timeGreaterThan5] += 10.
print "times after first operation: {0}".format(nominalArray['t'])

# Return those times to their original values
nominalArray[timeGreaterThan5]['t'] -= 10.
print "times after second operation: {0}".format(nominalArray['t'])

      

Doing this produces the following output:

original times: [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]
times after first operation: [  0.   1.   2.   3.   4.   5.  16.  17.  18.  19.]
times after second operation: [  0.   1.   2.   3.   4.   5.  16.  17.  18.  19.]

      

We can clearly see here that the second operation had no effect. If anyone can explain why this is happening, we would be very grateful.

+3


source to share


1 answer


This is really a copy v view problem. But I'll tell you more.

The key difference between the v representation in the copy is the indexing pattern, regular or not. Regular is expressed through an array shape

, strides

and dtype

. In general, a logical index (and a linked list of indexes) cannot be expressed in these terms, so it numpy

must return a copy.

I like to watch the property arr.__array_interface__

. It shows the shape, steps and a pointer to the data buffer. If the pointer is the same as with the original, this is view

.

With the arr[idx] += 1

Indexing - a technique in fact setitem

, which selects which of the data buffer elements will be modified with the addition. The distinction between view and copy does not apply.

But from the arr[idx1][idx2] += 1

first indexing is the method getitem

. For this, the distinction between glance and copy is important. The second indexing changes the array created by the 1st. If it is a view, the change affects the original data; if a copy, nothing happens. The copy can be modified, but it disappears across the garbage collection screen.

With 2d array, you can combine these two indexing steps arr[idx1, idx2] += 1

; and this is actually the preferred syntax.

With structured arrays, field indexing is similar to column indexing, but not exactly the same. First, it cannot be combined with item indexing.

Simple structured array:

In [234]: arr=np.ones((5,),dtype='i,f,i,f')
In [235]: arr.__array_interface__
{'strides': None,
 'shape': (5,),
 'data': (152524816, False),
 'descr': [('f0', '<i4'), ('f1', '<f4'), ('f2', '<i4'), ('f3', '<f4')],
 'typestr': '|V16',
 'version': 3}

      



Selecting one field creates a view - the same data pointer

In [236]: arr['f0'].__array_interface__['data']
Out[236]: (152524816, False)

      

Selecting items with boolean creates a diff pointer

In [242]: idx = np.array([1,0,0,1,1],bool)
In [243]: arr[idx].__array_interface__['data']
Out[243]: (152629520, False)

      

So, it arr['f0'][idx] += 1

changes the selected items in the box f0

.

arr[idx]['f0'] += 1

changes the field of the f0

copy without affecting arr

.

arr[idx]['f0'] + 1

and arr['f0'][idx] + 1

display the same thing, but they don't try to do any in-place changes.

You can select multiple fields of a structured array arr[['f0','f2']]

. But this is a copy. (and I get a warning suggesting to make an explicit copy).

+3


source







All Articles