Using Scirit-Learn SVR, how do you combine categorical and continuous features in target prediction?
I want to use a support vector machine to solve a regression problem in order to predict teachers' income based on several functions that are a mixture of categorical and continuous. For example, I have [white, Asian, Hispanic, black], year of study and years of education.
For categorical, I used sci-kit preprocessing module and hot coding 4 races. In this case, it will look like [1,0,0,0] for the white teacher, and hence I have an array of {[1,0,0,0], [0,1,0,0], ... [0,0,1,0], [1,0,0,0]}, representing the races of each teacher coded for SVR. I can only perform regression on race and income, i.e .:
clf= SVR(C=1.0) clf.fit(racearray, income)
I can also perform regression using quantitative traits. However, I don't know how to combine the elements together, i.e.
continousarray(zip(yearsteaching,yearseduction))
clf.fit((racearray, continousarray), income)
source to share
You can use scikit-learn OneHotEncoder . If your data is in a numpy "racearray" and the columns
[contionus_feature1, contious_feature2, categorical, continous_feature3]
your code should look like (remember numpy enumeration starts at 0)
from sklearn.preprocessing import OneHotEncoder enc = OneHotEncoder(categorical_features=[2]) race_encoded = enc.fit_transform(racearay)
you can view your array race_encode
as usual and use it in SVR like
clf= SVR(C=1.0) clf.fit(race_encoded, income)
source to share