Working with the smallest time clock in Pandas

I have some JSON data containing naive timezone dates.

["2014-03-07T09:04:26.943", "2014-03-06T20:35:21.937", "2014-02-25T12:39:44"]

      

I read this data using pandas.read_json and it treats it as a column of objects.

I know the data is in the Pacific time zone, not UTC.

Is there a vectorized way to convert this to an np.datetime64 column? I am currently doing:

def _parse_datetime(dt_string):
    # We are provided timezone naive data that is in Pacific time. Convert it to UTC.
    timestamp = pd.Timestamp(dt_string, tz="US/Pacific")
    if pd.isnull(timestamp):
        return pd.NaT
    return np.datetime64(timestamp)

data.apply(_parse_datetime)

      

which is very slow for a lot of data

UPDATE:

By specifying convert_dates I can coerce the data to date. However, when trying to localize, I get errors:

>>> dates = """["2014-03-07T09:04:26.943", "2014-03-06T20:35:21.937", "2014-02-25T12:39:44"]""" 
>>> baz = pd.read_json(dates, convert_dates=[0])[0]
>>> baz.tz_localize('US/Pacific')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/abeer/.virtualenvs/venv/lib/python2.7/site-packages/pandas/core/generic.py", line 3494, in tz_localize
ax_name)
TypeError: index is not a valid DatetimeIndex or PeriodIndex

      

In general I am trying to do this for a column in a dataframe, so I cannot change the index.

+3


source to share


1 answer


Use the convert_dates parameter to specify the columns, or use the series type parameter, which should be automatically converted.

>>> pd.read_json(dates, convert_dates=[0])[0]
0   2014-03-07 09:04:26.943000
1   2014-03-06 20:35:21.937000
2          2014-02-25 12:39:44
Name: 0, dtype: datetime64[ns]
>>> pd.read_json(dates, typ='series')
0   2014-03-07 09:04:26.943000
1   2014-03-06 20:35:21.937000
2          2014-02-25 12:39:44
dtype: datetime64[ns]

      

From there, you can use tz_localize on timestamps. Assuming it's too slow ...



baz.apply(lambda ts: ts.tz_localize('US/Pacific'))

      

The nested tz_localize works on the index (not the values):

>>> pd.Series(index=baz).tz_localize('US/Pacific')
0
2014-03-07 09:04:26.943000-08:00   NaN
2014-03-06 20:35:21.937000-08:00   NaN
2014-02-25 12:39:44-08:00          NaN
dtype: float64

      

+1


source







All Articles