How to choose C and gamma AFTER grid search using libSVM (RBF core) for best generalization?
I am aware of a lot of questions that ask the question of choosing the "best" C and gamma values ββfor SVMs (core RBF). The standard answer is grid search, however my questions start after grid search results. Let me explain:
I have a dataset of 10 subjects on which I am performing a leave-one-subject-out-xfold-validation, which means that I am performing a grid lookup on each excluded object. In order not to optimize the training data, I do not want to choose the best C and gamma, creating an average accuracy over all 10 models and getting the maximum. Given one model in xfold, I could only perform another xfold on the training data with that model (not including the topic of checking the remaining part). But you can imagine a computational effort, and I don't have enough time for that.
Since searching for a grid for each of the 10 models resulted in a wide range of good C and gamma parameters (the difference between accuracy is only 2-4%, see Figure 1) I thought about it differently.
I have defined an area within a mesh that only contains exact values ββthat have a 2% difference from the mesh's maximum precision. All other accuracy values ββwith a difference greater than 2% are set to zero (see Figure 2). I do this for each model and create an intersection between the regions of each model. This results in a much smaller range of C and gamma values ββthat would result in an accuracy within 2% of maximum. accuracy for each model. However, the range is still quite large. So, I thought about choosing the C-gamma pair with the lowest C, as that would mean that I am the farthest from overfitting and closer to a good generalization. Can I say so?
How would I even choose C and gamma in this area of ββC-gamma pairs that have all been verified as reliable corrections for my classifier in all 10 models? Should I focus on minimizing the C parameter? Or should I focus on minimizing C and gamma?
I found an answer to this question here ( Are high values ββfor c or gamma problematic when using the RBF SVM kernel? ) That says a combination of high C AND high gamma would mean overfitting. I figured out that the gamma value changes the width of the gaussian curve around the data points, but I still can't get my head around what it practically means in the dataset.
The post led me to a different idea. Can I use the number of SVs associated with the number of data points as a criterion for choosing between all C-gamut pairs? Low (number of SVs / number of data points) would mean better generalization? I am prepared to lose precision as it should not affect the result that interests me if I get a better generalization in return (at least from a theoretical point of view).
source to share
Since the linear kernel is a special case of the rbf. There is a method using linear SVM to tune C first, and bilinear tune CG pair later to save time.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.141.880&rep=rep1&type=pdf
source to share