Python - Pandas - Convert YYYYMM to datetime

A beginner python user (and therefore pandas). I am trying to import some data into a pandas framework. One of the columns is a date, but in the format "YYYYMM". I tried to do what most of the forum answers suggest:

df_cons['YYYYMM'] = pd.to_datetime(df_cons['YYYYMM'], format='%Y%m')

      

It doesn't work ( ValueError: unconverted data remains: 3

). The column actually includes an additional value for each year with MM = 13. The source used this row as the average for the last year. I guess I to_datetime

have a problem with this.

Can anyone suggest a quick solution, either strip all yearly averages (those with the last two digits "13") or leave to_datetime

them to ignore?

+3


source to share


2 answers


pass errors='coerce'

and then dropna

lines NaT

:

df_cons['YYYYMM'] = pd.to_datetime(df_cons['YYYYMM'], format='%Y%m', errors='coerce').dropna()

      

Month values ​​duff are converted to values NaT

In[36]:
pd.to_datetime('201613', format='%Y%m', errors='coerce')

Out[36]: NaT

      



Alternatively, you can filter them before converting

df_cons['YYYYMM'] = pd.to_datetime(df_cons.loc[df_cons['YYYYMM'].str[-2:] != '13','YYYYMM'], format='%Y%m', errors='coerce')

      

although this can lead to alignment issues as the series returned must be the same length, so a simple pass errors='coerce'

is an easier solution

+1


source


Clear the data block first.

df_cons = df_cons[~df_cons['YYYYMM'].str.endswith('13')]
df_cons['YYYYMM'] = pd.to_datetime(df_cons['YYYYMM'])

      

May I suggest turning the column into a period index if the YYYYMM column is unique in your dataset.



Include YYYYMM in the index first and then convert it to a monthly period.

df_cons = df_cons.reset_index().set_index('YYYYMM').to_period('M')

      

0


source







All Articles