Sum the rows of a numpy array where the starting index of each sum comes from another array

I have a NxM

numpy array called data

. I also have a length array N

called start_indices

. I want a new array of length M

where the i-th element is sum(data[i][start_indices[i]:])

.

Here's one way to do it:

import numpy as np
data = np.linspace(0, 11, 12).reshape((3, 4))
data
array([[0, 1, 2, 3],
       [4, 5, 6, 7],
       [8, 9, 10, 11]])
start_indices = np.array([0, 1, 2])
sums = []
for start_index, row in zip(start_indices, data):
    sums.append(np.sum(row[start_index:]))
sums = np.array(sums)

      

Is there a more numpythonic way to go?

+3


source to share


3 answers


You can create an array of masks

>>> mask = start_indices[:,None] <= np.arange(data.shape[1])
>>> (data * mask).sum(axis=1)
array([  6.,  18.,  21.])

      

As a final step, you can also use np.einsum

:

>>> np.einsum('ij,ij->i', data, mask)
array([  6.,  18.,  21.])

      



although using an array of masks here can be inefficient and duplicate too many indices.

Alternatively np.fromiter

:

>>> it = (r[i:].sum() for r, i in zip(data, start_indices))
>>> np.fromiter(it, data.dtype)
array([  6.,  18.,  21.])

      

+6


source


Apart from zip iteration (multiple forms) and masked amount it cumsum

might be worth testing

data[:,::-1].cumsum(axis=1)[range(data.shape[0]), data.shape[1]-1-start_indices]

      

cumsum

on the correct axis is easy; most of the expression is used to pull out the desired amounts.

In this small case, it is faster than zip iteration, but slower than masked sums. But the rating can change with the size.

I don't think any of these alternatives are more "python". They also use Python approved methods. These that avoid zip iteration may get numpy

brown dots, but only if they improve speed where it matters.



np.reduceat

promises even better speed (first cut, not generic):

np.add.reduceat(data.ravel(),[0,4,5,8,10])[::2]

      

This is a test expression and does not take into account the time it takes to create the list indices

indices = np.array([0,4,4,8,8]); indices[::2] += start_indices

      

+2


source


sums = np.array( [data[i, start_indices[i]:].sum() for i in range(data.shape[0])] )

      

0


source







All Articles