Using OneHotEncoder with sklearn_pandas DataFrameMapper
I am trying to use sklearn_pandas DataFrameMapper. This applies to column names along with the Preprocessing function that is required for that column. Thus,
mapper = sklearn_pandas.DataFrameMapper([
('hour',None),
('season',sklearn.preprocessing.OneHotEncoder()),
('holiday',None)
])
season is an int64 col in my pandas DataFrame.
This gives me the following error: Too many values to unpack. I understand that OneHotEncoder accepts a 2-D sample and not a 1-D sample.
How can I use this OneHotEncoder with sklearn_pandas or is it not possible.
source to share
The official version sklearn-pandas
has some problems when working with one-dimensional arrays and transformations. Try the following fork:
https://github.com/dukebody/sklearn-pandas
However, I think you can accomplish what you want using LabelBinarizer
(as in the examples sklearn_pandas
) instead OneHotEncoder
.
UPDATE 2015-11-28
As sklearn-pandas>=0.0.12
you can solve the problem by doing the following:
mapper = sklearn_pandas.DataFrameMapper([
('hour',None),
(['season'],sklearn.preprocessing.OneHotEncoder()),
('holiday',None)
])
From the docs:
The difference between specifying a column selector as
'column'
(as a simple string) and['column']
(as a list with one element) is the forms of the array that is passed to the transformer. In the first case, a one-dimensional array to be passed, and in the second case it will be a two-dimensional array with one column, that is, a column is a vector.
source to share