Sum the rows of a numpy array where the starting index of each sum comes from another array
I have a NxM
numpy array called data
. I also have a length array N
called start_indices
. I want a new array of length M
where the i-th element is sum(data[i][start_indices[i]:])
.
Here's one way to do it:
import numpy as np data = np.linspace(0, 11, 12).reshape((3, 4)) data array([[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]) start_indices = np.array([0, 1, 2]) sums = [] for start_index, row in zip(start_indices, data): sums.append(np.sum(row[start_index:])) sums = np.array(sums)
Is there a more numpythonic way to go?
source to share
You can create an array of masks
>>> mask = start_indices[:,None] <= np.arange(data.shape[1])
>>> (data * mask).sum(axis=1)
array([ 6., 18., 21.])
As a final step, you can also use np.einsum
:
>>> np.einsum('ij,ij->i', data, mask)
array([ 6., 18., 21.])
although using an array of masks here can be inefficient and duplicate too many indices.
Alternatively np.fromiter
:
>>> it = (r[i:].sum() for r, i in zip(data, start_indices))
>>> np.fromiter(it, data.dtype)
array([ 6., 18., 21.])
source to share
Apart from zip iteration (multiple forms) and masked amount it cumsum
might be worth testing
data[:,::-1].cumsum(axis=1)[range(data.shape[0]), data.shape[1]-1-start_indices]
cumsum
on the correct axis is easy; most of the expression is used to pull out the desired amounts.
In this small case, it is faster than zip iteration, but slower than masked sums. But the rating can change with the size.
I don't think any of these alternatives are more "python". They also use Python approved methods. These that avoid zip iteration may get numpy
brown dots, but only if they improve speed where it matters.
np.reduceat
promises even better speed (first cut, not generic):
np.add.reduceat(data.ravel(),[0,4,5,8,10])[::2]
This is a test expression and does not take into account the time it takes to create the list indices
indices = np.array([0,4,4,8,8]); indices[::2] += start_indices
source to share