Using Scirit-Learn SVR, how do you combine categorical and continuous features in target prediction?

I want to use a support vector machine to solve a regression problem in order to predict teachers' income based on several functions that are a mixture of categorical and continuous. For example, I have [white, Asian, Hispanic, black], year of study and years of education.

For categorical, I used sci-kit preprocessing module and hot coding 4 races. In this case, it will look like [1,0,0,0] for the white teacher, and hence I have an array of {[1,0,0,0], [0,1,0,0], ... [0,0,1,0], [1,0,0,0]}, representing the races of each teacher coded for SVR. I can only perform regression on race and income, i.e .:

clf= SVR(C=1.0)
clf.fit(racearray, income) 

      

I can also perform regression using quantitative traits. However, I don't know how to combine the elements together, i.e.

continousarray(zip(yearsteaching,yearseduction))
clf.fit((racearray, continousarray), income)

      

+3


source to share


1 answer


You can use scikit-learn OneHotEncoder . If your data is in a numpy "racearray" and the columns

[contionus_feature1, contious_feature2, categorical, continous_feature3]

your code should look like (remember numpy enumeration starts at 0)



from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(categorical_features=[2])
race_encoded = enc.fit_transform(racearay)

      

you can view your array race_encode

as usual and use it in SVR like

clf= SVR(C=1.0)
clf.fit(race_encoded, income) 

      

+2


source







All Articles