Pandas measure elapsed time when condition is true

Question

Pandas measure elapsed time when condition is true

I have the following framework:

                 dt binary
2016-01-01 00:00:00  False
2016-01-01 00:00:01  False
2016-01-01 00:00:02  False
2016-01-01 00:00:03  False
2016-01-01 00:00:04   True
2016-01-01 00:00:05   True
2016-01-01 00:00:06   True
2016-01-01 00:00:07  False
2016-01-01 00:00:08  False
2016-01-01 00:00:09   True
2016-01-01 00:00:10   True

I would like to summarize the past tense when binary

equal True

. I am sharing my solution that implements it, but something tells me that there should be an easier way as it is a fairly simple time series data function. Note that the data is most likely equidistant, but I cannot rely on this.

df['binary_grp'] = (df.binary.diff(1) != False).astype(int).cumsum()
# Throw away False values
df = df[df.binary]
groupby = df.groupby('binary_grp')
df = pd.DataFrame({'timespan': groupby.dt.last() - groupby.dt.first()})
return df.timespan.sum().seconds / 60.0

The hardest part is probably the first line. What it does is it basically assigns an incremental number to each sequential block. This is what the data looks like after that:

                 dt binary  binary_grp
2016-01-01 00:00:00  False           1
2016-01-01 00:00:01  False           1
2016-01-01 00:00:02  False           1
2016-01-01 00:00:03  False           1
2016-01-01 00:00:04   True           2
2016-01-01 00:00:05   True           2
2016-01-01 00:00:06   True           2
2016-01-01 00:00:07  False           3
2016-01-01 00:00:08  False           3
2016-01-01 00:00:09   True           4
2016-01-01 00:00:10   True           4

Is there a better way to do this? I am assuming this code is executed, my concern is readability.

+2

python pandas time-series

fodma1 June 15. 17 at 5:52

source to share

2 answers

jezrael · Answer 1 · 2017-06-15T05:57:14+0000

I think your decision is nice.

Another solution:

Compare shift

ed values with ne

, get groups cumsum

.

After filtering can be used apply

with a difference by choosing iloc

:

df['binary_grp'] = (df.binary.ne(df.binary.shift())).cumsum()

df = df[df.binary]

s = df.groupby('binary_grp')['dt'].apply(lambda x: x.iloc[-1] - x.iloc[0])
print (s)
binary_grp
2   00:00:02
4   00:00:01
Name: dt, dtype: timedelta64[ns]

all_time =  s.sum().seconds / 60.0
print (all_time)
0.05

Your solution DataFrame

doesn't need a new one if you only need all_time

:

groupby = df.groupby('binary_grp')

s = groupby.dt.last() - groupby.dt.first()
all_time =  s.sum().seconds / 60.0
print (all_time)
0.05

But if necessary, you can create it from Series

s

using to_frame

:

df1 = s.to_frame('timestamp')
print (df1)
           timestamp
binary_grp          
2           00:00:02
4           00:00:01

piRSquared · Answer 2 · 2017-06-15T07:10:00+0000

IIUC:

You want to find the sum of the time covered by the entire series, where binary

- True

.

However, we have to make some options or assumptions

                    dt  binary
0  2016-01-01 00:00:00   False
1  2016-01-01 00:00:01   False
2  2016-01-01 00:00:02   False
3  2016-01-01 00:00:03   False
4  2016-01-01 00:00:04    True # <- This where time starts
5  2016-01-01 00:00:05    True
6  2016-01-01 00:00:06    True
7  2016-01-01 00:00:07   False # <- And ends here. So this would
8  2016-01-01 00:00:08   False # be 00:00:07 - 00:00:04 or 3 seconds
9  2016-01-01 00:00:09    True # <- Starts again
10 2016-01-01 00:00:10    True # <- But ends here because
                               # I don't have another Timestamp

With these assumptions, we can use diff

, multiply andsum

df.dt.diff().shift(-1).mul(df.binary).sum()

Timedelta('0 days 00:00:04')

We can use this concept together with groupby

# Use xor and cumsum to identify change in True to False and False to True
grps = (df.binary ^ df.binary.shift()).cumsum()
mask = df.binary.groupby(grps).first()
df.dt.diff().shift(-1).groupby(grps).sum()[mask]

binary
1   00:00:03
3   00:00:01
Name: dt, dtype: timedelta64[ns]

Or without a mask

pd.concat([df.dt.diff().shift(-1).groupby(grps).sum(), mask], axis=1)

             dt  binary
binary                 
0      00:00:04   False
1      00:00:03    True
2      00:00:02   False
3      00:00:01    True

Pandas measure elapsed time when condition is true

More articles: