Pandas dataframe column handling with mixed date formats

I imported a CSV file with mixed data formats - some date formats recognized by read_csv plus some Excel sequential datetime format (e.g. 41,866.321).

After importing the data, the column type is displayed as an object (taking into account different data types) and dates (both types of formats) have a dtype string.

I would like to use the to_datetime method to convert the recognized string date formats to datetime in the dataframe column, leaving the unrecognized strings in excel format, which I can then isolate and fix off the line. But if I don't use the row-row method (too slow) it can't do it.

Does anyone have a smarter way to solve this?

Update: After reworking some more, I found this solution using coerce = True to force the data type conversion of the column, and then identified null values ​​that I can cross-reference the original file. But if there is a better way to do it (like capturing unrecognized timestamps in place) please let me know.

df1['DateTime']=pd.to_datetime(df1['Time_Date'],coerce=True)
nulls=df1['Time_Date'][df1['Time_Date'].notnull()==False]

      

+3


source to share


1 answer


After digging a little more, I found this solution, using coerce = True to force the data type conversion of the column and then identifying null values ​​that I can convert back to the original file. But if there is a better way to do this (like setting unrecognized timestamps in place), please let me know.



df1['DateTime']=pd.to_datetime(df1['Time_Date'], errors='coerce')
nulls=df1['Time_Date'][df1['Time_Date'].notnull()==False]

      

+3


source







All Articles