Python pandas cumsum () reset after hitting max
I have a pandas DataFrame with timedeltas as the cumulative sum of these deltas in a single column, expressed in milliseconds. An example is shown below:
Transaction_ID Time TimeDelta CumSum[ms]
1 00:00:04.500 00:00:00.000 000
2 00:00:04.600 00:00:00.100 100
3 00:00:04.762 00:00:00.162 262
4 00:00:05.543 00:00:00.781 1043
5 00:00:09.567 00:00:04.024 5067
6 00:00:10.654 00:00:01.087 6154
7 00:00:14.300 00:00:03.646 9800
8 00:00:14.532 00:00:00.232 10032
9 00:00:16.500 00:00:01.968 12000
10 00:00:17.543 00:00:01.043 13043
I would like to be able to supply a maximum value for CumSum [ms], after which the cumulative amount will start again at 0. For example, if in the above example the maximum value was 3000, the results would look like this:
Transaction_ID Time TimeDelta CumSum[ms]
1 00:00:04.500 00:00:00.000 000
2 00:00:04.600 00:00:00.100 100
3 00:00:04.762 00:00:00.162 262
4 00:00:05.543 00:00:00.781 1043
5 00:00:09.567 00:00:04.024 0
6 00:00:10.654 00:00:01.087 1087
7 00:00:14.300 00:00:03.646 0
8 00:00:14.532 00:00:00.232 232
9 00:00:16.500 00:00:01.968 2200
10 00:00:17.543 00:00:01.043 0
I investigated using the modulo operator, but I only managed to go back to zero when the resulting cumsum is equal to the provided limit (ie cumsum [ms] 500% 500 is zero).
Thanks in advance for any thoughts you have and please let me know if I can provide more details.
source to share
Here's an example of how you can do this by iterating over each row in a dataframe. I just created new data for the example:
df = pd.DataFrame({'TimeDelta': np.random.normal( 900, 60, size=100)})
print df.head()
TimeDelta
0 971.021295
1 734.359861
2 867.000397
3 992.166539
4 853.281131
So, loop the accumulator with your desired 3000 max:
maxvalue = 3000
lastvalue = 0
newcum = []
for row in df.iterrows():
thisvalue = row[1]['TimeDelta'] + lastvalue
if thisvalue > maxvalue:
thisvalue = 0
newcum.append( thisvalue )
lastvalue = thisvalue
Then put the list newcom
in a dataframe:
df['newcum'] = newcum
print df.head()
TimeDelta newcum
0 801.977678 801.977678
1 893.296429 1695.274107
2 935.303566 2630.577673
3 850.719497 0.000000
4 951.554206 951.554206
source to share