Count if: the task is in a certain time interval
I have a dataframe df1 that contains three columns:
No. Start Time End Time
1 07/28/15 08:03 AM 07/28/15 08:09 AM
2 07/28/15 08:06 AM 07/28/15 08:12 AM
The start and end time represents the start and end of a specific task. I want to build a new framework that counts the number of active jobs at a specific time on a specific day. Like this:
Hours Number of tasks
0:00
0:01
..
..
11:59
This data frame should display every minute of the day how many jobs are active. Work that starts at 8:03 am and ends at 8:09 am should count towards the following points: (Because it ends at 8:09 am and is no longer active at 8:09 am)
8:03
8:04
8:05
8:06
8:07
8:08
How can I do this in an easy way?
+3
source to share
1 answer
Not a pandas solution, but you can loop and filter.
Fast approximate base per hour:
import datetime
jobs = [
(datetime.datetime(15, 7, 28, 8, 3), datetime.datetime(15, 7, 28, 8, 9)),
(datetime.datetime(15, 7, 28, 8, 3), datetime.datetime(15, 7, 28, 8, 58)),
(datetime.datetime(15, 7, 28, 8, 3), datetime.datetime(15, 7, 28, 10, 3)),
(datetime.datetime(15, 7, 28, 8, 3), datetime.datetime(15, 7, 28, 9, 3)),
(datetime.datetime(15, 7, 28, 10, 3), datetime.datetime(15, 7, 28, 8, 3)),
]
data = {'hours': [], 'active_jobs': []}
for hour in range(24):
current__active_jobs = 0
for job in jobs:
if job[0].hour == hour:
current__active_jobs += 1
elif job[0].hour < hour and job[1].hour >= hour:
current__active_jobs += 1
data['hour'].append(hour)
data['active_jobs'].append(current__active_jobs)
print DataFrame(data)
+1
source to share