Pandas Find sequence or pattern in column

Here are some sample data for the problem I'm working on:

index     Quarter    Sales_Growth
0          2001q1    0
1          2002q2    0
2          2002q3    1
3          2002q4    0
4          2003q1    0
5          2004q2    0
6          2004q3    1
7          2004q4    1

      

The column Sales_Growth

indicates whether sales actually increased in the quarter or not. 0 = no growth, 1 = growth.

First, I am trying to return the first Quarter

, when there were two quarters in a row with no sales growth.

With the data above, this answer would be 2001q1

.

Then I want to bring back the second consecutive quarter of sales growth that occurs AFTER the initial two quarters of no growth.

The answer to this question will be 2004q4

.

I have searched and searched, but the closest answer I can find I cannot seem to work: stack overflow

Thanks in advance for helping the Pandas newbie, I'm hacking as best I can but stuck on this.

+1


source to share


3 answers


You are executing a subsequence. It's a little weird, but carrying with me:

growth = df.Sales_Growth.astype(str).str.cat()

      

This gives you:

'00100011'

      



Then:

growth.index('0011')

      

Gives you 4 (obviously, you would add the constant 3 to get the index of the last string matched by the pattern).

I feel like this approach starts off a bit ugly, but the end result is really useful - you can search for any fixed pattern without additional coding.

+2


source


For Q1:

temp = df.Sales_Growth + df.Sales_Growth.shift(-1)
df[temp == 0].head(1)

      



For Q2:

df[(df.Sales_Growth == 1) & (df.Sales_Growth.shift(1) == 1) & (df.Sales_Growth.shift(2) == 0) & (df.Sales_Growth.shift(3) == 0)].head(1)

      

+2


source


Building on earlier answers. Q1:

temp = df.Sales_Growth.rolling_apply(window=2, min_periods=2, \
    kwargs={pattern: [0,0]}, func=lambda x, pattern: x == pattern)
print(df[temp==1].head())

      

In the call to roll_apply window

and min_periods

must match the length of the list of templates passed to the roll_apply function.

Q2: same approach, different pattern:

temp = df.Sales_Growth.rolling_apply(window=4, min_periods=4, \
    kwargs={pattern: [0,0,1,1]}, func=lambda x, pattern: x == pattern)
print(df[temp==1].head())

      

+2


source







All Articles