Incrementing counter based on some value with np.where
I am trying to increment a counter while processing a Pandas series using np.where based on some time intervals of days. For example, if I have a series with the following values:
Date Value
01/03/2017 5
02/03/2017 8
03/03/2017 3
04/03/2017 7
12/03/2017 1
13/03/2017 3
14/03/2017 4
I'll move on to a series that looks like this:
df['DIFF'] = df['Date'].diff()/np.timedelta64(1, 'D')
To create this data file.
Date Value DIFF
01/03/2017 5 0
02/03/2017 8 1
03/03/2017 3 1
04/03/2017 7 1
12/03/2017 1 8
13/03/2017 3 1
14/03/2017 4 1
Next, I want to create a lifetime that counts the number of lives, assuming a time difference greater than say 4 would be a new instance of life.
Date Value DIFF LIFETIME
01/03/2017 5 0 1
02/03/2017 8 1 1
03/03/2017 3 1 1
04/03/2017 7 1 1
12/03/2017 1 8 2
13/03/2017 3 1 2
14/03/2017 4 1 2
I think I am almost with this code
df['LIFE'] = np.where(df['DIFF'] >=4, life_counter=df.shift(-1)+1, df.shift(-1))
The logic here would be that if the DIFF is greater than or equal to 4, I would set the LIFE variable to the previous + 1. Otherwise, it will be the same as the previous value. It seemed like it was a neat way to carry the state. However, my loop seems to be ignoring the state of what I have set, probably because of the way np.where works. Someone knows a way to do what I do and work. Currently my output looks like this.
Date Value DIFF LIFETIME
01/03/2017 5 0 1
02/03/2017 8 1 1
03/03/2017 3 1 1
04/03/2017 7 1 1
12/03/2017 1 8 2
13/03/2017 3 1 1
14/03/2017 4 1 1
source to share
I believe you just want to get the sum of the boolean array padded with 1
:
>>> df
Date Value DIFF
0 01/03/2017 5 0
1 02/03/2017 8 1
2 03/03/2017 3 1
3 04/03/2017 7 1
4 12/03/2017 1 8
5 13/03/2017 3 1
6 14/03/2017 4 1
>>> df['LIFETIME'] = np.cumsum(df.DIFF >= 4) + 1
>>> df
Date Value DIFF LIFETIME
0 01/03/2017 5 0 1
1 02/03/2017 8 1 1
2 03/03/2017 3 1 1
3 04/03/2017 7 1 1
4 12/03/2017 1 8 2
5 13/03/2017 3 1 2
6 14/03/2017 4 1 2
source to share