Replacing NaT with Epoch in Pandas

The missing NaT values ​​appear at the end of my dataframe as shown below. This clearly raises the value of the ValueError:

File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pytz/tzinfo.py", line 314, loc_dt = tzinfo.normalize (dt.replace (tzinfo = tzinfo)) ValueError: month must be at 1.12

I tried using both dropna:

data[col_name].dropna(0, inplace=True)

      

and fillna as recommended Working with missing data section :

data[col_name].fillna(0, inplace=True)

      

Before any of these lines, I tried to clear the data by replacing non-datetimes with the epoch time:

data[col_name] = a_col.apply(lambda x: x if isinstance(x, datetime.datetime)  else epoch)

      

Since NaT is technically time, this condition was not covered by this feature. Since isnull will handle this, I wrote this function to apply to the data [col_name]:

def replace_time(x):
if pd.isnull(x):
    return epoch
elif isinstance(x, datetime.datetime):
    return x
else:
    return epoch

      

Although it is part of the pd.isnull section, the value is not changed. However, when I try this function in this series (where the second value is NaT), it works:

s = pd.Series([pd.Timestamp('20130101'),np.nan,pd.Timestamp('20130102 9:30')],dtype='M8[ns]')

      

Data:

2003-04-29 00:00:00

NaT

NaT

NaT

+1


source to share


2 answers


Try:



data[col_name] = a_col.apply(lambda x: x if isinstance(x, datetime.datetime) 
                                       and not isinstance(x, pd.tslib.NaTType) else epoch)

      

+2


source


The main problem here is that you are indexing the chain through this expression

data[col_name].dropna(0, inplace=True)

      

This will potentially modify the copy and therefore will not change anything. It is quite difficult to do this to show a warning SettingWithCopy

. See the Docs here



.fillna/.dropna

There are corresponding ways to populate datetime64[ns]

dtypes. The use is .apply

pretty inefficient.

In [16]: df = DataFrame({ 'date' : pd.Series([pd.Timestamp('20130101'),np.nan,pd.Timestamp('20130102 9:30')]) })

In [17]: df
Out[17]: 
                 date
0 2013-01-01 00:00:00
1                 NaT
2 2013-01-02 09:30:00

In [18]: df.date.fillna(0)
Out[18]: 
0   2013-01-01 00:00:00
1   1970-01-01 00:00:00
2   2013-01-02 09:30:00
Name: date, dtype: datetime64[ns]

      

+1


source







All Articles