Parsing date and setting timezone in pandas dataframes

I have about 800,000 rows of data in a dataframe and one data column df ['Date'] is the time and date string 'YYYY-MM-DD HH: MM: SS.fff' which has no time zone information. However, I know they are in the New_York timezone and need to be converted to CET. Now I have two ways to get the job done:

method 1 (very slow):

df['Date'].apply(lambda x: timezone('America/New_York')\
            .localize(datetime.datetime.strptime(x,'%Y%m%d%H:%M:%S.%f'))\
            .astimezone(timezone('CET')))

      

method 2:

df.index = pd.to_datetime(df['Date'],format='%Y%m%d%H:%M:%S.%f')
df.index.tz_localize('America/New_York').tz_convert('CET')

      

I'm just wondering if there are any other other ways to do this? or any potential pitfalls of the methods I listed? Thank!

Also, I would like to move the whole timestamp by a fixed amount of time, for example 1ms , how can I implement it with method 2? timedelta(0,0,1000)

+3


source to share


1 answer


Method 2 is by far the best way to do this.

However, it occurs to me that you are formatting this date after loading the data.

It is much faster to parse file upload dates than to change them after upload. (Not to mention cleaner)

If your data is loaded from a csv file using a function pandas.read_csv()

, you can use the parse_dates=

and option date_parser=

.



You can try directly with your lambda function, date_parser=

just like set parse_dates=

in the list of date columns.

Like this:

pd.read_csv('myfile.csv', parse_dates=['Date'] date_parser=lambda x: timezone('America/New_York')\
        .localize(datetime.datetime.strptime(x,'%Y%m%d%H:%M:%S.%f'))\
        .astimezone(timezone('CET')))

      

Should work and will probably be the fastest.

+1


source







All Articles