Working with the smallest time clock in Pandas
I have some JSON data containing naive timezone dates.
["2014-03-07T09:04:26.943", "2014-03-06T20:35:21.937", "2014-02-25T12:39:44"]
I read this data using pandas.read_json and it treats it as a column of objects.
I know the data is in the Pacific time zone, not UTC.
Is there a vectorized way to convert this to an np.datetime64 column? I am currently doing:
def _parse_datetime(dt_string):
# We are provided timezone naive data that is in Pacific time. Convert it to UTC.
timestamp = pd.Timestamp(dt_string, tz="US/Pacific")
if pd.isnull(timestamp):
return pd.NaT
return np.datetime64(timestamp)
data.apply(_parse_datetime)
which is very slow for a lot of data
UPDATE:
By specifying convert_dates I can coerce the data to date. However, when trying to localize, I get errors:
>>> dates = """["2014-03-07T09:04:26.943", "2014-03-06T20:35:21.937", "2014-02-25T12:39:44"]"""
>>> baz = pd.read_json(dates, convert_dates=[0])[0]
>>> baz.tz_localize('US/Pacific')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/abeer/.virtualenvs/venv/lib/python2.7/site-packages/pandas/core/generic.py", line 3494, in tz_localize
ax_name)
TypeError: index is not a valid DatetimeIndex or PeriodIndex
In general I am trying to do this for a column in a dataframe, so I cannot change the index.
source to share
Use the convert_dates parameter to specify the columns, or use the series type parameter, which should be automatically converted.
>>> pd.read_json(dates, convert_dates=[0])[0]
0 2014-03-07 09:04:26.943000
1 2014-03-06 20:35:21.937000
2 2014-02-25 12:39:44
Name: 0, dtype: datetime64[ns]
>>> pd.read_json(dates, typ='series')
0 2014-03-07 09:04:26.943000
1 2014-03-06 20:35:21.937000
2 2014-02-25 12:39:44
dtype: datetime64[ns]
From there, you can use tz_localize on timestamps. Assuming it's too slow ...
baz.apply(lambda ts: ts.tz_localize('US/Pacific'))
The nested tz_localize works on the index (not the values):
>>> pd.Series(index=baz).tz_localize('US/Pacific')
0
2014-03-07 09:04:26.943000-08:00 NaN
2014-03-06 20:35:21.937000-08:00 NaN
2014-02-25 12:39:44-08:00 NaN
dtype: float64
source to share