Svmtrain - Specify the cost of skipping classification

Matlab offers the fitcsvm command to train svm. In it, you can pass a key-value pair called Cost, which specifies the individual SVM skip classification penalty for each class. Since I am using an older version of matlab I need to use svmtrain. However, I cannot find such a key-value pair for the function. Can I do it?

+3


source to share


2 answers


The C-SVM cost parameter is also called "boxconstraint". Please see its usage in this index .



A C-SVM with different cost parameters for each class is called a 2C-SVM. It is important to know that this strategy is best for working with imbalanced binany datasets where the number of samples in one class is much higher than another. A good pair of reclassification costs in this 2C-SVM method can also reduce the ratio of false positives (or false negatives) to balanced datasets, but it is generally very expensive to optimize these parameters. If you have time, take a look at a technique called displacement displacement. With this method, you train a C-SVM model with one cost parameter, and then you increase (or decrease) the bias parameter (b) to control the ratio of false positives (or false negatives). It is much faster than 2C-SVM and gives comparable results.

+1


source


If you want to classify false positives with more precision than false negatives, I think you can pass C

as a vector. Please read this and this to understand what C

SVM does. I am quoting the second post:

β€œHowever, it is important here, as with any regularization scheme, to choose the correct value for C, the penalty factor. If it is too large, we have a high penalty for non-separable points, and we can save a lot of support for vectors and outfits. If it is too small, we might be underestimating. " Alpaidin (2004), p. 224.



This way you can pass large values C

for false positives and tiny values C

for false negatives. There are many other factors you must take care of, such as overfing and under-performance. Typically setting a large value for C

provides good performance on the train, but it degrades on the test set due to overfitting. A (very) tiny value C

can ignore the constraints and produce suboptimal classification results, i.e. You can get significantly better performance with a different value C

. To avoid this, you can cross-validate. I have never tried this.

0


source







All Articles