Incrementing counter based on some value with np.where

I am trying to increment a counter while processing a Pandas series using np.where based on some time intervals of days. For example, if I have a series with the following values:

Date        Value
01/03/2017  5
02/03/2017  8
03/03/2017  3
04/03/2017  7
12/03/2017  1
13/03/2017  3
14/03/2017  4

      

I'll move on to a series that looks like this:

df['DIFF'] = df['Date'].diff()/np.timedelta64(1, 'D')

      

To create this data file.

Date        Value  DIFF
01/03/2017  5      0
02/03/2017  8      1
03/03/2017  3      1
04/03/2017  7      1
12/03/2017  1      8
13/03/2017  3      1
14/03/2017  4      1

      

Next, I want to create a lifetime that counts the number of lives, assuming a time difference greater than say 4 would be a new instance of life.

Date        Value  DIFF   LIFETIME
01/03/2017  5      0      1
02/03/2017  8      1      1
03/03/2017  3      1      1
04/03/2017  7      1      1
12/03/2017  1      8      2
13/03/2017  3      1      2
14/03/2017  4      1      2

      

I think I am almost with this code

df['LIFE'] = np.where(df['DIFF'] >=4, life_counter=df.shift(-1)+1, df.shift(-1))

      

The logic here would be that if the DIFF is greater than or equal to 4, I would set the LIFE variable to the previous + 1. Otherwise, it will be the same as the previous value. It seemed like it was a neat way to carry the state. However, my loop seems to be ignoring the state of what I have set, probably because of the way np.where works. Someone knows a way to do what I do and work. Currently my output looks like this.

Date        Value  DIFF   LIFETIME
01/03/2017  5      0      1
02/03/2017  8      1      1
03/03/2017  3      1      1
04/03/2017  7      1      1
12/03/2017  1      8      2
13/03/2017  3      1      1
14/03/2017  4      1      1

      

+3


source to share


1 answer


I believe you just want to get the sum of the boolean array padded with 1

:



>>> df
         Date  Value  DIFF
0  01/03/2017      5     0
1  02/03/2017      8     1
2  03/03/2017      3     1
3  04/03/2017      7     1
4  12/03/2017      1     8
5  13/03/2017      3     1
6  14/03/2017      4     1
>>> df['LIFETIME'] = np.cumsum(df.DIFF >= 4) + 1
>>> df
         Date  Value  DIFF  LIFETIME
0  01/03/2017      5     0         1
1  02/03/2017      8     1         1
2  03/03/2017      3     1         1
3  04/03/2017      7     1         1
4  12/03/2017      1     8         2
5  13/03/2017      3     1         2
6  14/03/2017      4     1         2

      

+5


source







All Articles