Poor SVM performance compared to Random Forest
I am using scikit-learn
python library for classification task. I used RandomForestClassifier
SVM as well (SVC class). However, while HF achieves about 66% accuracy and 68% resemblance, SVM only gets up to 45%.
I did GridSearch
for C and gamma options for rbf-SVM and also looked at scaling and normalization beforehand. However, I think the gap between rf and SVM is still too big.
What else should I consider to get adequate SVM performance?
I thought it should be possible to get at least up to equal results. (All points are cross-validated on the same tests and workout sets.)
source to share
As EdChum said in the comments, there is no rule or guarantee that any model always works best.
The RBF kernel model SVM makes the assumption that the optimal solution boundary is smooth and rotation invariant (after you fix the particular scaling of the function that is not rotation invariant).
The random forest makes no smoothness assumptions (this is a piecewise wise constant prediction function) and favors axis-oriented decision boundaries.
The assumptions made by the RF model may be better suited to the task.
BTW, thanks for looking up the grid C
and gamma
and checking out the effect of function normalization before asking about stackoverflow :)
Edit to gain a deeper understanding, it might be interesting to plot learning curves for the two models. It is possible that the regularization of the SVM model and the kernel throughput cannot handle sufficient processing, while the nature of the RF ensemble is best suited for this dataset size. The gap can get closer if you have more data. A learning curve plot is a good way to test how your model will benefit from more samples.
source to share