Adding timezone to time in pandas dataframe

I have my column with time in seconds. And the timezone of that time is in UTC, but Pandas doesn't know that. I would like to add this information.

df_data['transaction_creation_date']

0        1484161304
1        1489489785
2        1489161124
3        1488904824
4        1484908677
5        1485942900
6        1490854506
7        1485895432
8        1485975392
9        1489266328
10       1488525196
11       1490363033
12       1490617794
13       1486560642
14       1487170224
15       1484923852

      

So, I am doing something like this:

df_times = pd.DatetimeIndex(pd.to_datetime(df_data['transaction_creation_date'], unit='s'))
df_times = df_times.tz_localize(pytz.utc)

      

And when I print the timestamps stored in df_times

, then I have:

print(df_times.strftime('%s'))

['1484157704' '1489486185' '1489157524' ..., '1490684098' '1490284646'
 '1489602636']

      

So...

My UTC time is on line 0: 1484161304

after I added the timezone information it changed to 1484157704

...

My time zone is "Europe / Warsaw" and the difference between my time zone and UTC is 3600

as well 1484161304 - 1484157704 = 3600

.

So, Pandas was treating my UTC times as "Europe / Warsaw" and switching them back one hour to make them UTC, which messed up my data.

How do I set the UTC time zone for my time to prevent this from happening?

+3


source to share


1 answer


So I was unable to reproduce your results, but I am using a slightly different method to show the generated timestamp. I did not use the somewhat poorly maintained one %s

, but instead directly calculating the number of seconds since UTC epoch:

Code:

utc_at_epoch = pytz.utc.localize(dt.datetime(1970, 1, 1))
for t in df_times.tz_localize(pytz.utc):
    print(int((t - utc_at_epoch).total_seconds()))

      

Test code:



import pandas as pd
import datetime as dt
import pytz

df_data = pd.DataFrame([
    1484161304,
    1489489785,
    1489161124,
], columns=['transaction_creation_date'])
print(df_data)

df_times = pd.DatetimeIndex(pd.to_datetime(
    df_data['transaction_creation_date'], unit='s'))

utc_at_epoch = pytz.utc.localize(dt.datetime(1970, 1, 1))
for t in df_times.tz_localize(pytz.utc):
    print(int((t - utc_at_epoch).total_seconds()))

      

Results:

   transaction_creation_date
0                 1484161304
1                 1489489785
2                 1489161124
1484161304
1489489785
1489161124

      

0


source







All Articles