Date parsing error in Python pandas when reading file

Follow on question: Python pandas to read into file with date

I am unable to parse the date on the dataframe below. The code looks like this:

df = pandas.read_csv(file_name, skiprows = 2, index_col='datetime', 
                 parse_dates={'datetime': [0,1,2]}, delim_whitespace=True,
                 date_parser=lambda x: pandas.datetime.strptime(x, '%Y %m %d'))

      


         OTH-000.opc
              XKN1=    0.500000E-01
    Y   M   D     PRCP     VWC1    
 2006   1   1      0.0  0.17608E+00
 2006   1   2      6.0  0.21377E+00
 2006   1   3      0.1  0.22291E+00
 2006   1   4      3.0  0.23460E+00
 2006   1   5      6.7  0.26076E+00

      

I get the error: lambda () takes exactly 1 argument (3 data)

Based on @EdChum's comment below, if I use this code:

df = pandas.read_csv(file_name, skiprows = 2, index_col='datetime', parse_dates={'datetime': [0,1,2]}, delim_whitespace=True))

      

df.index results in an object and not a datetime series

df.index
Index([u'2006 1 1',u'2006 1 2'....,u'nan nan nan'],dtype='object')

      

Finally, the file is available here:

https://www.dropbox.com/s/0xgk2w4ed9mi4lx/test.txt?dl=0

+1


source to share


1 answer


OK. I see the problem, your file had extraneous blank lines at the end, unfortunately this messed up the parser as it was looking for spaces, this made df look like this:

Out[25]:
             PRCP     VWC1
datetime                  
2006 1 1      0.0  0.17608
2006 1 2      6.0  0.21377
2006 1 3      0.1  0.22291
2006 1 4      3.0  0.23460
2006 1 5      6.7  0.26076
nan nan nan   NaN      NaN

      

When I remove blank lines, they import and parse dates in order:



Out[26]:
            PRCP     VWC1
datetime                 
2006-01-01   0.0  0.17608
2006-01-02   6.0  0.21377
2006-01-03   0.1  0.22291
2006-01-04   3.0  0.23460
2006-01-05   6.7  0.26076

      

and the index is now datetimeindex as desired:

In [27]:

df.index
Out[27]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2006-01-01, ..., 2006-01-05]
Length: 5, Freq: None, Timezone: None

      

+1


source







All Articles