Break time from object date in pandas

I am having problems with some dates from zipped xlsx files. These files are loaded into sqlite database and then exported as .csv. Each file is about 40,000 lines a day. The problem I'm running into is that pd.to_datetime

it doesn't seem to work on these objects (dates from Excel format cause the problem, I guess - pure .csv files work fine with this command). This is actually ok - I don't need them to be in datetime format.

I am trying to create a ShortDate column that is %m/%d/%Y

. How to do it on a datetime object (format - mm / dd / yyyy hh: mm: ss from Excel). Next, I'll create a new column named RosterID that concatenates the EmployeeID field and ShortDate field along with a unique ID.

I am very new to pandas and currently I only use it to process .csv files (renaming and selecting specific columns, creating unique ids for use in filters in Tableau, etc.).

rep = pd.read_csv(r'C:\Users\Desktop\test.csv.gz', dtype = 'str', compression = 'gzip', usecols = ['etc','etc2'])
print('Read successfully.')
rep['Total']=1
rep['UniqueID']= rep['EmployeeID'] + rep['InteractionID']
rep['ShortDate'] = ??? #what do I do here to get what I am looking for?
rep['RosterID']= rep['EmployeeID'] + rep['ShortDate'] # this is my goal
print('Modified successfully.')

      

Here is some of the raw data from the .csv. Column names would be

InteractionID, Created Date, EmployeeID, Repeat Date
07927,04/01/2014 14:05:10,912a,04/01/2014 14:50:03
02158,04/01/2014 13:44:05,172r,04/04/2014 17:47:29
44279,04/01/2014 17:28:36,217y,04/07/2014 22:06:19

      

+3


source to share


3 answers


Create a new column, then just apply simple functions datetime

using lambda

and apply

.

In [14]: df['Short Date']= pd.to_datetime(df['Created Date'])

In [15]: df
Out[15]: 
   InteractionID    Created Date EmployeeID     Repeat Date  \
0           7927  4/1/2014 14:05       912a  4/1/2014 14:50   
1           2158  4/1/2014 13:44       172r  4/4/2014 17:47   
2          44279  4/1/2014 17:28       217y  4/7/2014 22:06   

           Short Date  
0 2014-04-01 14:05:00  
1 2014-04-01 13:44:00  
2 2014-04-01 17:28:00  

In [16]: df['Short Date'] = df['Short Date'].apply(lambda x:x.date().strftime('%m%d%y'))

In [17]: df
Out[17]: 
   InteractionID    Created Date EmployeeID     Repeat Date Short Date  
0           7927  4/1/2014 14:05       912a  4/1/2014 14:50     040114   
1           2158  4/1/2014 13:44       172r  4/4/2014 17:47     040114   
2          44279  4/1/2014 17:28       217y  4/7/2014 22:06     040114

      



Then just join the two columns. Convert the column Short Date

to strings to avoid errors when concatenating strings and integers.

In [32]: df['Roster ID'] = df['EmployeeID'] + df['Short Date'].map(str)

In [33]: df
Out[33]: 
   InteractionID    Created Date EmployeeID     Repeat Date Short Date  \
0           7927  4/1/2014 14:05       912a  4/1/2014 14:50     040114   
1           2158  4/1/2014 13:44       172r  4/4/2014 17:47     040114   
2          44279  4/1/2014 17:28       217y  4/7/2014 22:06     040114   

    Roster ID  
0  912a040114  
1  172r040114  
2  217y040114 

      

+6


source


You can apply a post-processing step that first converts the string to a date-time and then applies a lambda to keep only the date part:

In [29]:

df['Created Date'] = pd.to_datetime(df['Created Date']).apply(lambda x: x.date())
df['Repeat Date'] = pd.to_datetime(df['Repeat Date']).apply(lambda x: x.date())
df


Out[29]:
   InteractionID Created Date EmployeeID Repeat Date
0           7927   2014-04-01       912a  2014-04-01
1           2158   2014-04-01       172r  2014-04-04
2          44279   2014-04-01       217y  2014-04-07

      

EDIT

After revisiting, you can only access the date component with dt.date

if your pandas version is greater than 0.15.0

:

In [18]:
df['just_date'] = df['Repeat Date'].dt.date
df

Out[18]:
   InteractionID        Created Date EmployeeID         Repeat Date  \
0           7927 2014-04-01 14:05:10       912a 2014-04-01 14:50:03   
1           2158 2014-04-01 13:44:05       172r 2014-04-04 17:47:29   
2          44279 2014-04-01 17:28:36       217y 2014-04-07 22:06:19   

    just_date  
0  2014-04-01  
1  2014-04-04  
2  2014-04-07  

      



Also, you can now do dt.strftime

instead of using apply

to achieve the desired result:

In [28]:
df['short_date'] = df['Repeat Date'].dt.strftime('%m%d%Y')
df

Out[28]:
   InteractionID        Created Date EmployeeID         Repeat Date  \
0           7927 2014-04-01 14:05:10       912a 2014-04-01 14:50:03   
1           2158 2014-04-01 13:44:05       172r 2014-04-04 17:47:29   
2          44279 2014-04-01 17:28:36       217y 2014-04-07 22:06:19   

    just_date short_date  
0  2014-04-01   04012014  
1  2014-04-04   04042014  
2  2014-04-07   04072014  

      

So generating the Roster ID is now a trivial exercise of adding two new columns:

In [30]:
df['Roster ID'] = df['EmployeeID'] + df['short_date']
df

Out[30]:
   InteractionID        Created Date EmployeeID         Repeat Date  \
0           7927 2014-04-01 14:05:10       912a 2014-04-01 14:50:03   
1           2158 2014-04-01 13:44:05       172r 2014-04-04 17:47:29   
2          44279 2014-04-01 17:28:36       217y 2014-04-07 22:06:19   

    just_date short_date     Roster ID  
0  2014-04-01   04012014  912a04012014  
1  2014-04-04   04042014  172r04042014  
2  2014-04-07   04072014  217y04072014  

      

+7


source


You can also do this using only standard libraries (in whatever format you want: "% m /% d /% Y", "% m-% d-% Y", or other orders / formats):

In [118]:

import time
df['Created Date'] = df['Created Date'].apply(lambda x: time.strftime('%m/%d/%Y', time.strptime(x, '%m/%d/%Y %H:%M:%S')))
In [120]:

print df
   InteractionID Created Date EmployeeID          Repeat Date
0           7927   04/01/2014       912a  04/01/2014 14:50:03
1           2158   04/01/2014       172r  04/04/2014 17:47:29
2          44279   04/01/2014       217y  04/07/2014 22:06:19

      

0


source







All Articles