Date parsing error in Python pandas when reading file
Follow on question: Python pandas to read into file with date
I am unable to parse the date on the dataframe below. The code looks like this:
df = pandas.read_csv(file_name, skiprows = 2, index_col='datetime',
parse_dates={'datetime': [0,1,2]}, delim_whitespace=True,
date_parser=lambda x: pandas.datetime.strptime(x, '%Y %m %d'))
OTH-000.opc
XKN1= 0.500000E-01
Y M D PRCP VWC1
2006 1 1 0.0 0.17608E+00
2006 1 2 6.0 0.21377E+00
2006 1 3 0.1 0.22291E+00
2006 1 4 3.0 0.23460E+00
2006 1 5 6.7 0.26076E+00
I get the error: lambda () takes exactly 1 argument (3 data)
Based on @EdChum's comment below, if I use this code:
df = pandas.read_csv(file_name, skiprows = 2, index_col='datetime', parse_dates={'datetime': [0,1,2]}, delim_whitespace=True))
df.index results in an object and not a datetime series
df.index
Index([u'2006 1 1',u'2006 1 2'....,u'nan nan nan'],dtype='object')
Finally, the file is available here:
+1
source to share
1 answer
OK. I see the problem, your file had extraneous blank lines at the end, unfortunately this messed up the parser as it was looking for spaces, this made df look like this:
Out[25]:
PRCP VWC1
datetime
2006 1 1 0.0 0.17608
2006 1 2 6.0 0.21377
2006 1 3 0.1 0.22291
2006 1 4 3.0 0.23460
2006 1 5 6.7 0.26076
nan nan nan NaN NaN
When I remove blank lines, they import and parse dates in order:
Out[26]:
PRCP VWC1
datetime
2006-01-01 0.0 0.17608
2006-01-02 6.0 0.21377
2006-01-03 0.1 0.22291
2006-01-04 3.0 0.23460
2006-01-05 6.7 0.26076
and the index is now datetimeindex as desired:
In [27]:
df.index
Out[27]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2006-01-01, ..., 2006-01-05]
Length: 5, Freq: None, Timezone: None
+1
source to share