Finding index of all templates in Pandas Dataframe
I am using a date-indexed Pandas framework that looks something like this:
TimeSys_Index
2014-08-29 00:00:18 0
2014-08-29 00:00:19 0
2014-08-29 00:00:20 1
2014-08-29 00:00:21 1
2014-08-29 00:00:22 0
2014-08-29 00:00:23 0
2014-08-29 00:00:24 0
2014-08-29 00:00:25 0
2014-08-29 00:00:26 0
2014-08-29 00:00:27 1
2014-08-29 00:00:28 1
2014-08-29 00:00:29 1
2014-08-29 00:00:30 1
2014-08-29 00:00:31 0
2014-08-29 00:00:32 0
2014-08-29 00:00:33 0
...
I want to find the index (time) for each occurrence of the pattern [0, 0, 1, 1]. Using the above sequence, I would like it to return ['2014-08-29 00:00:18', '2014-08-29 00:00:25']. This kicker must be vectorized, or at least very fast.
I was thinking about correlating the full vector with the template vector and finding the indices where the resulting vector is 4, but there should be an easier way.
+3
source to share
1 answer
You can see the shifted values:
>>> df.head()
val
TimeSys_Index
2014-08-29 00:00:18 0
2014-08-29 00:00:19 0
2014-08-29 00:00:20 1
2014-08-29 00:00:21 1
2014-08-29 00:00:22 0
>>> i = (df['val'] == 0) & (df['val'].shift(-1) == 0)
>>> i &= (df['val'].shift(-2) == 1) & (df['val'].shift(-3) == 1)
>>> df.index[i]
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-08-29 00:00:18, 2014-08-29 00:00:25]
Length: 2, Freq: None, Timezone: None
+3
source to share