Efficiently inserting unaligned elements into a numpy array
I am using numpy 1.9 to work with a set of arrays. Assuming I have something like this, I have two arrays 2d A
and B
and a 1st array C
that looks like this:
>>> A
array([[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]])
>>> B
array([[-1., -1., -1., -1., -1.],
[-1., -1., -1., -1., -1.],
[-1., -1., -1., -1., -1.],
[-1., -1., -1., -1., -1.],
[-1., -1., -1., -1., -1.]])
>>> C
array([1, 3, 2, 4, 0])
My goal is to insert A of all elements from B, according to C. More specifically, if C at position 0 has 1, B [0, 1] should be inserted after A [0, 1].
Here's the expected output:
array([[ 1, 1, -1, 1, 1, 1],
[ 1, 1, 1, 1, -1, 1],
[ 1, 1, 1, -1, 1, 1],
[ 1, 1, 1, 1, 1, -1],
[ 1, -1, 1, 1, 1, 1]])
I tried to implement it like this, but it's not very fast:
for i in xrange(size(C, 0)):
j = C[i]
A[i, :] = numpy.insert(A[i], j, B[i, j])
Is there a way to make it faster? (do it with a single numpy operation like mask or something)
source to share
How about a nasty one-liner?
First, the data; the arrays have the same shape as yours, but I've used integers to make the example easier to read.
In [81]: A
Out[81]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
In [82]: B
Out[82]:
array([[ 0, 100, 200, 300, 400],
[ 500, 600, 700, 800, 900],
[1000, 1100, 1200, 1300, 1400],
[1500, 1600, 1700, 1800, 1900],
[2000, 2100, 2200, 2300, 2400]])
In [83]: C
Out[83]: array([1, 3, 2, 4, 0])
And here's the disgusting one-liner:
In [84]: np.insert(A.ravel(), np.ravel_multi_index((range(A.shape[0]), C), A.shape) + 1, B[range(B.shape[0]), C]).reshape(A.shape[0], A.shape[1]+1)
Out[84]:
array([[ 0, 1, 100, 2, 3, 4],
[ 5, 6, 7, 8, 800, 9],
[ 10, 11, 12, 1200, 13, 14],
[ 15, 16, 17, 18, 19, 1900],
[ 20, 2000, 21, 22, 23, 24]])
Here's the broken version:
A.ravel()
flattens A
into the 1st array I will call F
:
In [87]: F = A.ravel()
In [88]: F
Out[88]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24])
(EDIT: It turns out that this first step - anti-aliasing A
- is not required. As @hpaulj points out in his answer, np.insert
the default will be to flatten the array.)
np.ravel_multi_index
used to convert desired 2D positions to indices into a flattened array. + 1
at the end is necessary because you want to insert items after the index given in C
:
In [89]: insert_indices = np.ravel_multi_index((range(A.shape[0]), C), A.shape) + 1
In [90]: insert_indices
Out[90]: array([ 2, 9, 13, 20, 21])
B[range(B.shape[0]), C]
extracts the desired values from B
:
In [91]: values = B[range(B.shape[0]), C]
In [92]: values
Out[92]: array([ 100, 800, 1200, 1900, 2000])
np.insert
performs the actual insert and creates a new array:
In [93]: np.insert(F, insert_indices, values)
Out[93]:
array([ 0, 1, 100, 2, 3, 4, 5, 6, 7, 8, 800,
9, 10, 11, 12, 1200, 13, 14, 15, 16, 17, 18,
19, 1900, 20, 2000, 21, 22, 23, 24])
Now just change this to get the final result:
In [94]: np.insert(F, insert_indices, values).reshape(A.shape[0], A.shape[1]+1)
Out[94]:
array([[ 0, 1, 100, 2, 3, 4],
[ 5, 6, 7, 8, 800, 9],
[ 10, 11, 12, 1200, 13, 14],
[ 15, 16, 17, 18, 19, 1900],
[ 20, 2000, 21, 22, 23, 24]])
source to share
First, a few slightly legible arrays:
>>> A
array([[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]])
>>> B
array([[-1., -1., -1., -1., -1.],
[-1., -1., -1., -1., -1.],
[-1., -1., -1., -1., -1.],
[-1., -1., -1., -1., -1.],
[-1., -1., -1., -1., -1.]])
>>> C
array([1, 3, 2, 4, 0])
Next, the shenanigans mask:
>>> ge_mask = C.reshape(-1, 1) >= numpy.arange(5)
>>> eq_mask = C.reshape(-1, 1) == numpy.arange(5)
>>> lt_mask = C.reshape(-1, 1) < numpy.arange(5)
And coup de grâce:
>>> result = numpy.zeros((A.shape[0], A.shape[1] + 1))
>>> result[:,0:5][ge_mask] = A[ge_mask]
>>> result[:,1:6][eq_mask] = B[eq_mask]
>>> result[:,1:6][lt_mask] = A[lt_mask]
>>> result
array([[ 1., 1., -1., 1., 1., 1.],
[ 1., 1., 1., 1., -1., 1.],
[ 1., 1., 1., -1., 1., 1.],
[ 1., 1., 1., 1., 1., -1.],
[ 1., -1., 1., 1., 1., 1.]])
source to share
I believe this is the fixed iteration:
A=np.arange(25).reshape(5,5) B=np.arange(25).reshape(5,5)*-1 C=np.array([1,3,2,4,0]) A2=np.zeros((5,6),dtype=int) for i,c in enumerate(C): A2[i,:]=np.insert(A[i],c+1,B[i,c])
production:
array([[ 0, 1, -1, 2, 3, 4],
[ 5, 6, 7, 8, -8, 9],
[ 10, 11, 12, -12, 13, 14],
[ 15, 16, 17, 18, 19, -19],
[ 20, -20, 21, 22, 23, 24]])
This can be turned into a single liner like:
np.array([np.insert(a, c+1, b[c]) for a,b,c in zip(A,B,C)])
Equivalent terms in Warren's answer:
c <=> c = np.ravel_multi_index((range(5), C), (5,5))
b <=> B.ravel()[c]
np.insert(A, c+1, B.ravel()[c]).reshape(5,6)
np.insert
ravels A
by default. For this small example, this one ravel_multi_index
is 2x faster than iterating over a string.
source to share