Pandas returns error "Missing header names are incompatible with usecols"

The following works as expected. There are 190 columns which all read perfectly.

pd.read_csv("data.csv", 
             header=None,
             names=columns,
             # usecols=columns[:10], 
             nrows=10
             )

      

I've used the usecols argument before, so I'm wondering why this no longer works for me. I would assume that simply slicing the first 10 column names would trivially work, but I continue to find the "Missing header names incompatible with usecols" error.

I am using pandas 0.16.2.

pd.read_csv("data.csv", 
             header=None,
             names=columns,
             usecols=columns[:10], 
             nrows=10
             )

      

Traceback

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-44> in <module>()
      3                     nrows=10,
      4                     header=None,
----> 5                     names=columns,
      6                     )

/.../lib/python2.7/site-packages/pandas/io/parsers.pyc in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, na_fvalues, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, float_precision, nrows, iterator, chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols, infer_datetime_format, skip_blank_lines)
    472                     skip_blank_lines=skip_blank_lines)
    473 
--> 474         return _read(filepath_or_buffer, kwds)
    475 
    476     parser_f.__name__ = name

/.../lib/python2.7/site-packages/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds)
    248 
    249     # Create the parser.
--> 250     parser = TextFileReader(filepath_or_buffer, **kwds)
    251 
    252     if (nrows is not None) and (chunksize is not None):

/.../lib/python2.7/site-packages/pandas/io/parsers.pyc in __init__(self, f, engine, **kwds)
    564             self.options['has_index_names'] = kwds['has_index_names']
    565 
--> 566         self._make_engine(self.engine)
    567 
    568     def _get_options_with_defaults(self, engine):

/.../m9tn/lib/python2.7/site-packages/pandas/io/parsers.pyc in _make_engine(self, engine)
    703     def _make_engine(self, engine='c'):
    704         if engine == 'c':
--> 705             self._engine = CParserWrapper(self.f, **self.options)
    706         else:
    707             if engine == 'python':

/.../lib/python2.7/site-packages/pandas/io/parsers.pyc in __init__(self, src, **kwds)
   1070         kwds['allow_leading_cols'] = self.index_col is not False
   1071 
-> 1072         self._reader = _parser.TextReader(src, **kwds)
   1073 
   1074         # XXX

pandas/parser.pyx in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4732)()

pandas/parser.pyx in pandas.parser.TextReader._get_header (pandas/parser.c:7330)()

ValueError: Passed header names mismatches usecols

      

+3


source to share


1 answer


It turns out there were 191 columns in the dataset (not 190). Pandas automatically sets my first data column as the index. I don't quite understand why this caused an error, since all the columns in the collections were actually present in the parsed dataset.

So the solution is to confirm that the number of columns in the names exactly matches the number of columns in your dataset.



Also, I found this GitHub discussion.

+6


source







All Articles