Pandas Find sequence or pattern in column
Here are some sample data for the problem I'm working on:
index Quarter Sales_Growth
0 2001q1 0
1 2002q2 0
2 2002q3 1
3 2002q4 0
4 2003q1 0
5 2004q2 0
6 2004q3 1
7 2004q4 1
The column Sales_Growth
indicates whether sales actually increased in the quarter or not. 0 = no growth, 1 = growth.
First, I am trying to return the first Quarter
, when there were two quarters in a row with no sales growth.
With the data above, this answer would be 2001q1
.
Then I want to bring back the second consecutive quarter of sales growth that occurs AFTER the initial two quarters of no growth.
The answer to this question will be 2004q4
.
I have searched and searched, but the closest answer I can find I cannot seem to work: stack overflow
Thanks in advance for helping the Pandas newbie, I'm hacking as best I can but stuck on this.
source to share
You are executing a subsequence. It's a little weird, but carrying with me:
growth = df.Sales_Growth.astype(str).str.cat()
This gives you:
'00100011'
Then:
growth.index('0011')
Gives you 4 (obviously, you would add the constant 3 to get the index of the last string matched by the pattern).
I feel like this approach starts off a bit ugly, but the end result is really useful - you can search for any fixed pattern without additional coding.
source to share
Building on earlier answers. Q1:
temp = df.Sales_Growth.rolling_apply(window=2, min_periods=2, \
kwargs={pattern: [0,0]}, func=lambda x, pattern: x == pattern)
print(df[temp==1].head())
In the call to roll_apply window
and min_periods
must match the length of the list of templates passed to the roll_apply function.
Q2: same approach, different pattern:
temp = df.Sales_Growth.rolling_apply(window=4, min_periods=4, \
kwargs={pattern: [0,0,1,1]}, func=lambda x, pattern: x == pattern)
print(df[temp==1].head())
source to share