Finding latest values up to a specific value in numpy / pandas

Question

Finding latest values up to a specific value in numpy / pandas

I have a pandas series and I want to find the index / position (or boolean mask) of the last time some value appears before some other specific value.

eg. Given:

df = pd.DataFrame({'x':np.random.randint(10, 1000000)})

I want to find all locations 0 that will be the last up to 9. So if my array was

[9, 0, 3, 0, 1, 9, 4, 9, 0, 0, 9, 4, 0]

I am only interested in the zeros at positions 3 and 9. Note that in this I am not worried about what happens to the last 0 at position 12. I would rather not have it in the return set, but that is not critical.

My current method:

df['last'] = np.nan
df.loc[df.x == 0, 'last'] = 0.0
df.loc[df.x == 9, 'last'] = 1.0
df.last.fillna(method='bfill', inplace=True)
df.loc[df.x == 0, 'last'] = np.nan
df.last.fillna(method='bfill', inplace=True)
df.last.fillna(value=0.0, inplace=True)
df.loc[df.x != 0, 'last'] = 0.0

Would anyone have a faster or more concise method?

+3

python numpy pandas dataframe

user1027953 04 nov. 14 at 16:25

source to share

3 answers

You can use boolean indexing and shift

. For example:

>>> s = pd.Series([9, 0, 3, 0, 9, 4, 9, 0, 0, 9, 4, 0])
>>> s[(s == 0) & (s.shift(-1) == 9)]
3    0
8    0
dtype: int64

This finds the index locations in s

that have a value of 0 and are immediately followed by 9.

Edit : slightly adapted so that we allow values between 9 and the last preceding zero (also see @ acushner's answer) ...

Here's a slightly modified series s

; we still want the zeros to be at indices 3 and 8:

>>> s = pd.Series([9, 0, 3, 0, 9, 4, 9, 0, 0, 4, 9, 0])
>>> t = s[(s == 0) | (s == 9)]
>>> t
0     9
1     0
3     0
4     9
6     9
7     0
8     0
10    9
11    0

t

is a series with all nines and zeros in s

. We can get the corresponding indices in the same way as before:

>>> t[(t == 0) & (t.shift(-1) == 9)]
3    0
8    0
dtype: int64

+1

Alex Riley 04 nov. 14 at 16:31

source to share

I think this works for general inputs:

def find_last_a_before_b(arr, a, b):
    arr = np.asarray(arr)
    idx_a, = np.where(arr == a)
    idx_b, = np.where(arr == b)
    iss = idx_b.searchsorted(idx_a)
    mask = np.concatenate((iss[1:] != iss[:-1],
                           [True if iss[-1] < len(idx_b) else False]))
    return idx_a[mask]

>>> find_last_a_before_b([9, 0, 3, 0, 1, 9, 4, 9, 0, 0, 9, 4, 0], 0, 9)
array([3, 9])
>>> find_last_a_before_b([9, 0, 3, 0, 1, 9, 4, 9, 0, 0, 9, 4, 0], 9, 0)
array([ 0,  7, 10])

The key is in use np.searchsorted

to find out that 9 comes after a given 0, then get rid of the repeats and the last one if there is no 9 after it.

0

Jaime 04 nov. '14 at 19:50

source to share

acushner · Accepted Answer · 2014-11-04T16:55:11+0000

very simple, @ajcr's answer:

s = pd.Series([9, 0, 3, 0, 1, 9, 4, 9, 0, 0, 9, 4, 0]) #using your example array
s = s[s.isin([0,9])]
s[(s == 0) & (s.shift(-1) == 9)]

Finding latest values ​​up to a specific value in numpy / pandas

More articles:

Finding latest values up to a specific value in numpy / pandas