Python / Numpy - fill gaps between inconsistent dots?

I'm trying to find a vector / fast / numpy-friendly way to convert the following values ​​in column A to column B:

ID  A   B
1   0   0
2   0   0
3   1   0
4   1   1
5   0   1
6   0   1
7   -1  1
8   0   0
9   1   0
10  0   1
11  0   1
12  1   1
13  0   1
14  -1  1
15  0   0

      

The algorithm for determining column "B" is to fill in any spaces between groups 1 and -1 with 1, skipping the first row in each pair. That is, for ID4-ID7, column B is filled with ones (taking into account the initial 1 in column A @ ID3). Further, from ID10-ID14 is filled with ones (since column A @ ID9 = 1).

While it's easy to do this with a for loop, I'm wondering if there is a solution without a loop? The O (n) loop based solution is below:

import numpy as np
import pandas as pd
x = np.array([ 0, 0, 1, 1, 0 ,0, -1, 0, 1, 0 , 0, 1, 0, -1, 0])


def make_y(x,showminus=False):
    y = x * 0
    state = 0 # are we in 1 or 0 or -1
    for i,n in enumerate(x):
        if n == 1 and n != state:
            state = n
            if i < len(y)-1:
                y[i+1] = state
        elif n == -1 and n != state:
            y[i] = state
            if showminus:
                state = -1
            else:
                state = 0
        else:
            y[i] = state
    return y

y = make_y(x)
print pd.DataFrame([x,y]).T

      

The above function gives the following performance on my machine:

%timeit y = make_y(x)
10000 loops, best of 3: 28 Β΅s per loop

      

I guess there must be some way to speed things up, since I will end up having to deal with arrays of length 10 million + elements ...

+3


source to share


2 answers


A possible vector solution could be as follows:

idx_1s, = np.where(x == -1)  # find the positions of the -1's
idx1s, = np.where(x == 1)  # find the positions of the 1's

      

To find which 1 should turn into 0 and mark the start of a block of 1:

idx0s = np.concatenate(([0], np.searchsorted(idx1s, idx_1s[:-1])))
idx0s = idx1s[idx0s]

      

We now have two arrays of the same length, idx0s

and idx_1s

, denoting the positions of the first and last elements of each block, so now we can do:



y = x.copy()
y[idx0s] = 0
idx0s += 1
idx_1s += 1
mask = np.zeros_like(y, dtype=np.bool)
mask[idx0s] = True
mask[idx_1s] = True
mask = np.logical_xor.accumulate(mask)
y[mask] = 1

      

Which gives the desired output:

>>> y
array([0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0])

      

It might be a little flimsy with skewed inputs and I don't think it will handle trailing -1 gracefully. But the only non-O (n) operation is the call to searchsorted, but it searchsorted

has optimization to speed up the search for sorted keys, so it probably won't be noticeable.

If that time is on yours x

, it won't beat the loop version, but for much larger arrays, it probably will.

+2


source


This works great,



A=[0,0,1,1,0,0,-1,0,1,0,0,1,0,-1,0]
B=[]
#initializing column with same number of zeros 
for j in range(len(A)):
    B.append(0)
print A
for i in range(len(A)):
    #retrieve the indices of pair (1 to -1)
    try:
            one_index=A.index(1)
            neg_one_index=A.index(-1)
    except:
            pass 
    one_index=one_index+1
    #replacing the zeros in column B by 1 at correct locations
    while one_index<=neg_one_index:
            B[one_index]=1
            A[one_index-1]=0
            A[one_index]=0
            one_index=one_index+1
print B
#output->[0,0,0,1,1,1,1,0,0,1,1,1,1,1,0] (i.e correct)

      

+1


source







All Articles