Efficiently inserting unaligned elements into a numpy array

I am using numpy 1.9 to work with a set of arrays. Assuming I have something like this, I have two arrays 2d A

and B

and a 1st array C

that looks like this:

>>> A
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.]])
>>> B
array([[-1., -1., -1., -1., -1.],
       [-1., -1., -1., -1., -1.],
       [-1., -1., -1., -1., -1.],
       [-1., -1., -1., -1., -1.],
       [-1., -1., -1., -1., -1.]])
>>> C
array([1, 3, 2, 4, 0])

      

My goal is to insert A of all elements from B, according to C. More specifically, if C at position 0 has 1, B [0, 1] should be inserted after A [0, 1].

Here's the expected output:

array([[ 1,  1, -1,  1,  1,  1],
       [ 1,  1,  1,  1, -1,  1],
       [ 1,  1,  1, -1,  1,  1],
       [ 1,  1,  1,  1,  1, -1],
       [ 1, -1,  1,  1,  1,  1]])

      

I tried to implement it like this, but it's not very fast:

for i in xrange(size(C, 0)):
    j = C[i]
    A[i, :] = numpy.insert(A[i], j, B[i, j])

      

Is there a way to make it faster? (do it with a single numpy operation like mask or something)

+3


source to share


3 answers


How about a nasty one-liner?

First, the data; the arrays have the same shape as yours, but I've used integers to make the example easier to read.

In [81]: A
Out[81]: 
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

In [82]: B
Out[82]: 
array([[   0,  100,  200,  300,  400],
       [ 500,  600,  700,  800,  900],
       [1000, 1100, 1200, 1300, 1400],
       [1500, 1600, 1700, 1800, 1900],
       [2000, 2100, 2200, 2300, 2400]])

In [83]: C
Out[83]: array([1, 3, 2, 4, 0])

      

And here's the disgusting one-liner:

In [84]: np.insert(A.ravel(), np.ravel_multi_index((range(A.shape[0]), C), A.shape) + 1, B[range(B.shape[0]), C]).reshape(A.shape[0], A.shape[1]+1)
Out[84]: 
array([[   0,    1,  100,    2,    3,    4],
       [   5,    6,    7,    8,  800,    9],
       [  10,   11,   12, 1200,   13,   14],
       [  15,   16,   17,   18,   19, 1900],
       [  20, 2000,   21,   22,   23,   24]])

      

Here's the broken version:

A.ravel()

flattens A

into the 1st array I will call F

:

In [87]: F = A.ravel()

In [88]: F
Out[88]: 
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24])

      

(EDIT: It turns out that this first step - anti-aliasing A

- is not required. As @hpaulj points out in his answer, np.insert

the default will be to flatten the array.)



np.ravel_multi_index

used to convert desired 2D positions to indices into a flattened array. + 1

at the end is necessary because you want to insert items after the index given in C

:

In [89]: insert_indices = np.ravel_multi_index((range(A.shape[0]), C), A.shape) + 1

In [90]: insert_indices
Out[90]: array([ 2,  9, 13, 20, 21])

      

B[range(B.shape[0]), C]

extracts the desired values ​​from B

:

In [91]: values = B[range(B.shape[0]), C]

In [92]: values
Out[92]: array([ 100,  800, 1200, 1900, 2000])

      

np.insert

performs the actual insert and creates a new array:

In [93]: np.insert(F, insert_indices, values)
Out[93]: 
array([   0,    1,  100,    2,    3,    4,    5,    6,    7,    8,  800,
          9,   10,   11,   12, 1200,   13,   14,   15,   16,   17,   18,
         19, 1900,   20, 2000,   21,   22,   23,   24])

      

Now just change this to get the final result:

In [94]: np.insert(F, insert_indices, values).reshape(A.shape[0], A.shape[1]+1)
Out[94]: 
array([[   0,    1,  100,    2,    3,    4],
       [   5,    6,    7,    8,  800,    9],
       [  10,   11,   12, 1200,   13,   14],
       [  15,   16,   17,   18,   19, 1900],
       [  20, 2000,   21,   22,   23,   24]])

      

+5


source


First, a few slightly legible arrays:

>>> A
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.]])
>>> B
array([[-1., -1., -1., -1., -1.],
       [-1., -1., -1., -1., -1.],
       [-1., -1., -1., -1., -1.],
       [-1., -1., -1., -1., -1.],
       [-1., -1., -1., -1., -1.]])
>>> C
array([1, 3, 2, 4, 0])

      

Next, the shenanigans mask:

>>> ge_mask = C.reshape(-1, 1) >= numpy.arange(5)
>>> eq_mask = C.reshape(-1, 1) == numpy.arange(5)
>>> lt_mask = C.reshape(-1, 1) < numpy.arange(5)

      



And coup de grâce:

>>> result = numpy.zeros((A.shape[0], A.shape[1] + 1))
>>> result[:,0:5][ge_mask] = A[ge_mask]
>>> result[:,1:6][eq_mask] = B[eq_mask]
>>> result[:,1:6][lt_mask] = A[lt_mask]
>>> result
array([[ 1.,  1., -1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1., -1.,  1.],
       [ 1.,  1.,  1., -1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1., -1.],
       [ 1., -1.,  1.,  1.,  1.,  1.]])

      

Warren posted the just posted answer, it seems like it might be better memory-wise. Not sure about the speed. (I really think this is somewhat legible!)

+3


source


I believe this is the fixed iteration:

A=np.arange(25).reshape(5,5)
B=np.arange(25).reshape(5,5)*-1
C=np.array([1,3,2,4,0])

A2=np.zeros((5,6),dtype=int)
for i,c in enumerate(C):
    A2[i,:]=np.insert(A[i],c+1,B[i,c])

      

production:

array([[  0,   1,  -1,   2,   3,   4],
       [  5,   6,   7,   8,  -8,   9],
       [ 10,  11,  12, -12,  13,  14],
       [ 15,  16,  17,  18,  19, -19],
       [ 20, -20,  21,  22,  23,  24]])

      

This can be turned into a single liner like:

 np.array([np.insert(a, c+1, b[c]) for  a,b,c in zip(A,B,C)])

      

Equivalent terms in Warren's answer:

c <=> c = np.ravel_multi_index((range(5), C), (5,5))
b <=> B.ravel()[c]
np.insert(A, c+1, B.ravel()[c]).reshape(5,6)

      

np.insert

ravels A

by default. For this small example, this one ravel_multi_index

is 2x faster than iterating over a string.

+3


source







All Articles