Pandas: working between lines after grouping and reindexing
I have a pandas dataframe with multiple 1000 lines that looks like this:
x.head()
id jname wbdqueue_id startdatetime \
59 1341127 ondemand_build_baspen-w7g 26581 2017-07-31 23:14:56
60 1341126 ondemand_qa_qforchecka 26581 2017-07-31 23:15:35
61 1341125 ondemand_build_bchecka 26581 2017-07-31 23:14:56
63 1341123 ondemand_build_baspen-w7f 26581 2017-07-31 23:10:05
64 1341122 ondemand_update_waspen-w7a 26581 2017-07-31 23:09:32
enddatetime
59 2017-07-31 23:19:12
60 2017-07-31 23:34:12
61 2017-07-31 23:15:30
63 2017-07-31 23:14:56
64 2017-07-31 23:10:00
I would like, for each wbdqueue_id, to get the difference between the startdatetime ofdemand_update_waspen-w7a and the enddatetime ofdemand_build_baspen-w7g. What is the way to do this?
I have implemented the CSV file and parsed both startdatetime and enddatetime as time. Then I grouped the wbdqueue_id. My thought was to index each group by jname so that I could find the start and end timestamps for the two jnames I need. But when I do this, all other values ββbecome NaN or NaT (for time columns).
-Sachin
source to share
I would write a function with the described logic to make things very clear and the following code is easy to follow:
import pandas as pd
def get_time_diff(dff):
start_time = dff[dff.jname.eq('ondemand_update_waspen-w7a')].startdatetime.values[0]
end_time = dff[dff.jname.eq('ondemand_build_baspen-w7g')].enddatetime.values[0]
return pd.Timedelta(end_time - start_time)
Then you can use the function in the operation group-by
:
df.groupby('wbdqueue_id').apply(get_time_diff)
This gives:
wbdqueue_id
26581 00:09:40
dtype: timedelta64[ns]
Note that I am going with end_time - start_time
because logically you want to return a positive time, and the end time is always greater than the start time.
I hope this serves the purpose.
source to share
If you only want a different statrdatetime and enddatetime in your result, you can try this.
df1=df.loc[df.jname.isin(['ondemand_update_waspen-w7a','ondemand_build_baspen-w7f']),:]
df1.groupby('wbdqueue_id').apply(lambda x:x.startdatetime-x.enddatetime.shift())[4]
Out[467]:
wbdqueue_id
26581.0 -1 days +23:55:00
Name: 4, dtype: timedelta64[ns]
source to share