Create a new column base on an existing time column in a dataframe
I need to create a shift column based on an existing time column.
For example, I have a dataframe df1 with details:
time
0 10:30
1 13:50
2 19:20
3 14:10
I need a dataframe that looks like this with a shift:
- from 8:30 to 12:30 = shift 1,
- 12:30 to 20:20 = shift 2
- 20:30 to 8:30 = shift 3
time shift
0 10:30 1
1 13:50 2
2 19:20 2
3 23:10 3
source to share
Next, an offset dictionary is used to help determine the offset associated with a given time:
import pandas as pd
df = pd.DataFrame({'time': ['00:00','08:29', '08:30', '08:31', '12:29', '12:30', '12:31', '20:29', '20:30', '20:31', '23:59', '10:30', '13:50', '19:20', '14:10', '23:10']})
# Convert the time column into datetime objects
df.time = pd.to_datetime(df.time).dt.time
# Set up a shifts dictionary
shifts = {('8:30', '12:30'): 1 , ('12:30', '20:30'): 2, ('20:30', '8:30'): 3}
# Convert the keys to datetime objects
shifts = {tuple(map(pd.to_datetime, k)):v for k,v in shifts.items()}
# Expand the datetime objects beyond one day if the second element occurred after the first element
shifts = {(k if k[0].time() < k[1].time() else (k[0],k[1]+pd.to_timedelta('1day'))):v for k,v in shifts.items()}
# Determine shift
def get_shift(time):
try:
return shifts.get([k for k in shifts if time in pd.date_range(*k, freq='60S', closed='left').time][0])
except:
return 'No Shift'
# Use .apply on the time column to get the shift column
df['shift'] = df.time.apply(get_shift)
print(df)
Outputs:
# time shift
# 0 00:00:00 3
# 1 08:29:00 3
# 2 08:30:00 1
# 3 08:31:00 1
# 4 12:29:00 1
# 5 12:30:00 2
# 6 12:31:00 2
# 7 20:29:00 2
# 8 20:30:00 3
# 9 20:31:00 3
# 10 23:59:00 3
# 11 10:30:00 1
# 12 13:50:00 2
# 13 19:20:00 2
# 14 14:10:00 2
# 15 23:10:00 3
source to share
You can accomplish this apply
using the create column function shift
.
import datetime
def check_shift(row):
shift_time = row[0]
if datetime.time(8, 30) <= shift_time <= datetime.time(12, 30):
return 1
elif datetime.time(12, 30) < shift_time <= datetime.time(20, 30):
return 2
else:
return 3
df['shift'] = df.apply(check_shift, axis='columns')
This will lead to the following file frame
time shift
0 10:30:00 1
1 13:50:00 2
2 19:20:00 2
3 14:10:00 2
If we adjust this last offset to 23:10
(for example, your sample output), we get the following:
time shift
0 10:30:00 1
1 13:50:00 2
2 19:20:00 2
3 23:10:00 3
An important note here, I converted the column time
from string to actual type time
:
df['time'] = pd.to_datetime(df['time'], format="%H:%M").dt.time
source to share
Assuming we have the following DF:
In [380]: df
Out[380]:
time
0 00:00
1 08:29
2 08:30
3 08:31
4 12:29
5 12:30
6 12:31
7 20:29
8 20:30
9 20:31
10 23:59
In [381]: df.dtypes
Out[381]:
time object
dtype: object
Consider this solution:
In [382]: bins = [-1, 830, 1230, 2030, 2400]
...: labels = [0,1,2,3]
...: df['shift'] = pd.cut(df.time.str.replace(':','').astype(int),
...: bins=bins, labels=labels, right=False)
...: df.loc[df['shift']==0, 'shift'] = 3
...:
In [383]: df
Out[383]:
time shift
0 00:00 3
1 08:29 3
2 08:30 1
3 08:31 1
4 12:29 1
5 12:30 2
6 12:31 2
7 20:29 2
8 20:30 3
9 20:31 3
10 23:59 3
Explanation:
- first we convert
time
to a numeric value, for example08:29
→829
,12:31
→1231
etc. - now we can cut them into 4 cells (shifts):
[0,1,2,3]
NOTE: labels must be unique, so we could not specify[3,1,2,3]
- Finally, we have to change the
0
→3
as we have to split the interval between20:30 - 08:30
by two:00:00 - 08:30
and20:30 - 23:59:59
source to share