Numpy: cumulative multiplicity rate

Question

Numpy: cumulative multiplicity rate

I have a sorted array of ints that can have repetitions. I would like to count consecutive equal values, restarting from zero when the value is different from the previous one. This is the expected output, implemented with a simple python loop:

import numpy as np

def count_multiplicities(a):
    r = np.zeros(a.shape, dtype=a.dtype)
    for i in range(1, len(a)):
        if a[i] == a[i-1]:
            r[i] = r[i-1]+1
        else:
            r[i] = 0
    return r

a = (np.random.rand(20)*5).astype(dtype=int)
a.sort()

print "given sorted array: ", a
print "multiplicity count: ", count_multiplicities(a)

Output:

given sorted array:  [0 0 0 0 0 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4]
multiplicity count:  [0 1 2 3 4 0 1 2 0 1 2 3 0 1 2 3 0 1 2 3]

How can I get the same result in an efficient way using numpy? The array is very long, but there are only a few repetitions (say, no more than ten).

In my special case, I also know that values start at zero and that the difference between successive values is 0 or 1 (no spaces in the values).

+3

performance python numpy

Emanuele paolini Jul 26 17 at 8:03

source to share

2 answers

I would use numba for problems like this

import numba
nb_count_multiplicities = numba.njit("int32[:](int32[:])")(count_multiplicities)
X=nb_count_multiplicities(a)

Without rewriting your code at all, it's about 50 percent faster than Divakar's vectorial solution.

Veclisation is useful many times if it results in shorter and possibly more comprehensible codes, but if you are determined to vectorize code, which can also be a problem for a fairly experienced numba programmer, this is the way to go.

+1

max9111 Jul 26 17 at 14:23

source to share

Divakar · Accepted Answer · 2017-07-26T08:13:35+0000

Here's one cumsum

based on a vectorized approach -

def count_multiplicities_cumsum_vectorized(a):      
    out = np.ones(a.size,dtype=int)
    idx = np.flatnonzero(a[1:] != a[:-1])+1
    out[idx[0]] = -idx[0] + 1
    out[0] = 0
    out[idx[1:]] = idx[:-1] - idx[1:] + 1
    np.cumsum(out, out=out)
    return out

Example run -

In [58]: a
Out[58]: array([0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4])

In [59]: count_multiplicities(a) # Original approach
Out[59]: array([0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 0, 1, 2])

In [60]: count_multiplicities_cumsum_vectorized(a)
Out[60]: array([0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 0, 1, 2])

Runtime test -

In [66]: a = (np.random.rand(200000)*1000).astype(dtype=int)
    ...: a.sort()
    ...: 

In [67]: a
Out[67]: array([  0,   0,   0, ..., 999, 999, 999])

In [68]: %timeit count_multiplicities(a)
10 loops, best of 3: 87.2 ms per loop

In [69]: %timeit count_multiplicities_cumsum_vectorized(a)
1000 loops, best of 3: 739 µs per loop

Related post

...

Numpy: cumulative multiplicity rate

More articles: