Using OneHotEncoder with sklearn_pandas DataFrameMapper

I am trying to use sklearn_pandas DataFrameMapper. This applies to column names along with the Preprocessing function that is required for that column. Thus,

mapper = sklearn_pandas.DataFrameMapper([
    ('hour',None),
    ('season',sklearn.preprocessing.OneHotEncoder()),
    ('holiday',None)
])

      

season is an int64 col in my pandas DataFrame.

This gives me the following error: Too many values ​​to unpack. I understand that OneHotEncoder accepts a 2-D sample and not a 1-D sample.

How can I use this OneHotEncoder with sklearn_pandas or is it not possible.

+3


source to share


1 answer


The official version sklearn-pandas

has some problems when working with one-dimensional arrays and transformations. Try the following fork: https://github.com/dukebody/sklearn-pandas

However, I think you can accomplish what you want using LabelBinarizer

(as in the examples sklearn_pandas

) instead OneHotEncoder

.

UPDATE 2015-11-28

As sklearn-pandas>=0.0.12

you can solve the problem by doing the following:



mapper = sklearn_pandas.DataFrameMapper([
    ('hour',None),
    (['season'],sklearn.preprocessing.OneHotEncoder()),
    ('holiday',None)
])

      

From the docs:

The difference between specifying a column selector as 'column'

(as a simple string) and ['column']

(as a list with one element) is the forms of the array that is passed to the transformer. In the first case, a one-dimensional array to be passed, and in the second case it will be a two-dimensional array with one column, that is, a column is a vector.

+1


source







All Articles