How to choose C and gamma AFTER grid search using libSVM (RBF core) for best generalization?

I am aware of a lot of questions that ask the question of choosing the "best" C and gamma values ​​for SVMs (core RBF). The standard answer is grid search, however my questions start after grid search results. Let me explain:

I have a dataset of 10 subjects on which I am performing a leave-one-subject-out-xfold-validation, which means that I am performing a grid lookup on each excluded object. In order not to optimize the training data, I do not want to choose the best C and gamma, creating an average accuracy over all 10 models and getting the maximum. Given one model in xfold, I could only perform another xfold on the training data with that model (not including the topic of checking the remaining part). But you can imagine a computational effort, and I don't have enough time for that.

Since searching for a grid for each of the 10 models resulted in a wide range of good C and gamma parameters (the difference between accuracy is only 2-4%, see Figure 1) I thought about it differently.

I have defined an area within a mesh that only contains exact values ​​that have a 2% difference from the mesh's maximum precision. All other accuracy values ​​with a difference greater than 2% are set to zero (see Figure 2). I do this for each model and create an intersection between the regions of each model. This results in a much smaller range of C and gamma values ​​that would result in an accuracy within 2% of maximum. accuracy for each model. However, the range is still quite large. So, I thought about choosing the C-gamma pair with the lowest C, as that would mean that I am the farthest from overfitting and closer to a good generalization. Can I say so?


How would I even choose C and gamma in this area of ​​C-gamma pairs that have all been verified as reliable corrections for my classifier in all 10 models? Should I focus on minimizing the C parameter? Or should I focus on minimizing C and gamma?


I found an answer to this question here ( Are high values ​​for c or gamma problematic when using the RBF SVM kernel? ) That says a combination of high C AND high gamma would mean overfitting. I figured out that the gamma value changes the width of the gaussian curve around the data points, but I still can't get my head around what it practically means in the dataset.

The post led me to a different idea. Can I use the number of SVs associated with the number of data points as a criterion for choosing between all C-gamut pairs? Low (number of SVs / number of data points) would mean better generalization? I am prepared to lose precision as it should not affect the result that interests me if I get a better generalization in return (at least from a theoretical point of view).

Balanced accuries after grid search

Balanced accuries that follow my region and intersect criterium

+3


source to share


1 answer


Since the linear kernel is a special case of the rbf. There is a method using linear SVM to tune C first, and bilinear tune CG pair later to save time.



http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.141.880&rep=rep1&type=pdf

+1


source







All Articles