Finding index of all templates in Pandas Dataframe

I am using a date-indexed Pandas framework that looks something like this:

TimeSys_Index
2014-08-29 00:00:18    0
2014-08-29 00:00:19    0
2014-08-29 00:00:20    1
2014-08-29 00:00:21    1
2014-08-29 00:00:22    0
2014-08-29 00:00:23    0
2014-08-29 00:00:24    0
2014-08-29 00:00:25    0
2014-08-29 00:00:26    0
2014-08-29 00:00:27    1
2014-08-29 00:00:28    1
2014-08-29 00:00:29    1
2014-08-29 00:00:30    1
2014-08-29 00:00:31    0
2014-08-29 00:00:32    0
2014-08-29 00:00:33    0
...

      

I want to find the index (time) for each occurrence of the pattern [0, 0, 1, 1]. Using the above sequence, I would like it to return ['2014-08-29 00:00:18', '2014-08-29 00:00:25']. This kicker must be vectorized, or at least very fast.

I was thinking about correlating the full vector with the template vector and finding the indices where the resulting vector is 4, but there should be an easier way.

+3


source to share


1 answer


You can see the shifted values:



>>> df.head()
                     val
TimeSys_Index           
2014-08-29 00:00:18    0
2014-08-29 00:00:19    0
2014-08-29 00:00:20    1
2014-08-29 00:00:21    1
2014-08-29 00:00:22    0
>>> i = (df['val'] == 0) & (df['val'].shift(-1) == 0)
>>> i &= (df['val'].shift(-2) == 1) & (df['val'].shift(-3) == 1)
>>> df.index[i]
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-08-29 00:00:18, 2014-08-29 00:00:25]
Length: 2, Freq: None, Timezone: None

      

+3


source







All Articles