Leave one precision for classifying multiple classes

I am a bit confused about how to use the leave method ( LOO ) to calculate the precision in the case of classifying classes with multiple classes, one v / s. I am working on YUPENN Dynamic Scene Recognition dataset which contains 14 categories with 30 videos in each category (420 videos total). Lets you name the 14 classes as {A, B, C, D, E, F, G, H, I, J, K, L, M, N}.

I am using linear SVM for one v / s rest classification. Let's say I want to find a precision result for class "A". When I do "A" v / s "rest" I need to exclude one video during training and test the model on the video that I excluded. This is the video I am excluding, should it be from A grade or should it be from all grades.

In other words, to determine the accuracy of class "A" I should execute SVM with LOO 30 times (leaving each video from class "A" exactly once), or I should execute it 420 times (leaving video from all classes exactly once).

I get the feeling that I got it all mixed up? Can anyone provide me with a short outline of the correct way to accomplish multi-class classification using LOO ?
Also how to do this using libsvm in Matlab?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~

The lack of video in the dataset is small and so I cannot afford to create a separate TEST set (which should have been submitted to Neptune). Instead, I have to make sure that I am making full use of the dataset because each video provides some new / unique information. In scenarios like this, I've read that people use LOO as a measure of precision (when we can't afford an isolated TEST set). They call it Leave-One-Video-Out-experiment .

The people who worked on Dynamic Scene Recognition used this methodology to verify accuracy. To compare the accuracy of my method with theirs, I need to use the same evaluation process. But they just mentioned that they use LOVO to ensure accuracy. Not many details are provided. I am new to this area and therefore it is a little confusing.

As per what I can think of, LOVO can be done in two ways:

1) leave one video out of 420 videos. Deliver 14 one-v / s-rest classifiers using 419 videos as a workout set. ("A" v / s "rest", "B" v / s "rest", ........ 'N' v / s 'rest').

Rate the remaining video using 14 classifiers. Label this with the class that gives the highest trust score. Thus, one video is classified. We follow the same procedure to tag all 420 videos. Using these 420 labels, we can find the confusion matrix, figure out false positives / negatives, accuracy, recall, etc.

2) I leave one video from each of the 14 classes. This means that I choose 406 videos for training and 14 for testing. Using 406 videos, I recognize 14 one-in-a-rest classifiers. I rate each of the 14 videos in the test set and give them labels based on their maximum trust score. In the next round, I again leave 14 videos, one from each class. But this time the set of 14 is such that none of them were left in the previous round. I train again and rate 14 videos and recognize labels. So I continue this process 30 times, repeating a set of 14 videos each time. As a result, all 420 videos are tagged. In this case also I am calculating the mixing matrix, precision, precision and recall, etc.

Apart from these two methods, LOVO could be done in many other styles. In the docs on dynamic scene recognition, they didn't mention how they perform LOVO. Can we assume they are using method 1? Is there a way to decide which method is best? Was there a significant difference in the accuracy obtained by these two methods?



The following are some of the recent work on dynamic scene recognition for reference purposes. In the evaluation section, they mentioned LOVO. 1) http://www.cse.yorku.ca/vision/publications/FeichtenhoferPinzWildesCVPR2014.pdf 2) http://www.cse.yorku.ca/~wildes/wildesBMVC2013b.pdf 3) http: //www.seas .upenn.edu / ~ derpanis / derpanis_lecce_daniilidis_wildes_CVPR_2012.pdf 4) http://webia.lip6.fr/~thomen/papers/Theriault_CVPR_2013.pdf 5) http://www.umiacs.umd.edu/~censhroff ./DynSnext pdf

+3


source to share


2 answers


When using cross-validation, it is important to keep in mind that it applies to model preparation , and not usually to fair-to-god measures, to the end of an integer precision, which are instead reserved for measuring classification accuracy on a test set that has not been affected at all or by any means. or is involved in a way during training.

Let's focus on just one individual classifier that you plan to build. Classifier "A versus rest". You are going to split all the data into a training set and a test set, and then you put the test set in a cardboard box, zip it up, cover it with branding tape, place it in a titanium dome, and attach it to a NASA rocket that will precipitate it into icy oceans. Neptune.

Then let's look at the training set. When we train with the training kit, we would like to leave some of the training data aside for calibration purposes only , but not as part of the official Neptune Ocean test suite.

So what we can do is tell each data point (in your case, it appears that the data point is a video value object) to sit once. We don't care if it comes from Class A or not. So if there are 420 videos to be used in the A versus rest classifier-only training set, yes, you're going to fit 420 different SVMs.

And in fact, if you are configuring options for SVMs, this is where you will do it. For example, if you are trying to pick a penalty term or coefficient in a polynomial kernel or something like that, then you will repeat the whole learning process (yep, all 420 different trainable SVMs) for all parameter combinations you want to search through. And for each parameter set, you will associate with it the sum of the accuracy points from 420 classified LOO classifiers.

Once all this is done, you will select the parameter set with the best LOO score and veil, that is, the A versus rest classifier. Rinse and repeat for B versus rest, etc.



When all this is happening, there is a big problem that you are overflowing with data. Especially if many of the "negative" patterns have to be repeated from class to class.

But that's why you submitted this testing to Neptune. Once you are done with all the LOO-based SVM parameters and you have the final classifier, now you run that classifier over your actual test set (from Neptune) and this will tell you if it all shows efficiency in prediction over invisible data.

This whole exercise is obviously costly to compute. So instead people will sometimes use "Leave-P-Out" where P is much greater than 1. And instead of repeating this process until all samples have spent some time in the left group, they will simply repeat it a "reasonable" number of times , for various definitions of reasonable.

In a Leave-P-Out situation, there are some algorithms that allow you to try which points are left out that rightfully represent the classes. So if samples "A" make up 40% of the data, you might want them to take up about 40% of the specified resolution.

This really does not apply to LOOs for two reasons: (1) you will almost always perform LOOs on every point in the training data, so trying to sample them with them will be irrelevant if they all end up being used exactly once. (2) If you plan to use LOO for a certain number of times less than the sample size (usually not recommended), then random points from the set will naturally represent the relative class frequencies, and therefore if you plan to do LOO for K times, then simple enough taking a random size-K of the training set subsample and doing a regular LOO on them would be sufficient.

+3


source


In short, the articles you mentioned use the second criterion, i.e. leave one video from each class, which makes 14 videos for testing and the rest for teaching.



0


source







All Articles