How can I efficiently find the accuracy of a classifier

Even with a simple classifier like nearest neighbor, I cannot judge its accuracy and therefore cannot improve it.

For example, with the code below:

IDX = knnsearch(train_image_feats, test_image_feats);
    predicted_categories = cell([size(test_image_feats, 1), 1]);
    for i=1:size(IDX,1)
        predicted_categories{i}=train_labels(IDX(i));
    end

      

Here train_image_feats

is a 300 by 256 matrix, where each row represents an image. The same structure test_image_feats

. train_labels

is the label corresponding to each row of the training matrix.

In the next book, I simply said that this method achieves an accuracy of 19%.

How did the author come to this conclusion? Is there a way to judge the accuracy of my results with this classifier or others?

The author then uses a different feature extraction method and says that he improved the accuracy by 30%.

How can I find the accuracy? Be it graphically or simply through simple interest.

+3


source to share


1 answer


Accuracy in machine training and classification is usually computed by comparing your predicted results with your classifier versus basic truth. When you evaluate the classification accuracy of your classifier, you have already created a predictive model using a set of trainings with known inputs and outputs. At this point, you will have a test case with inputs and outputs that were not used to train the classifier. For the purposes of this post, let's call this dataset basic truth... This underlying truth dataset helps assess the accuracy of your classifier when you provide input to that classifier that it has not seen before. You take your inputs from your test suite and run them through your classifier. You get an output for each input, and we call the predicted values collection of those outputs .

For each predicted value, you compare to the corresponding truth value and see if it matches. You are summing all instances in which the outputs match between the predicted truth and the underlying truth. Adding all of these values ​​up and dividing by the total number of points in the test set gives the fraction of instances where your model accurately predicted the outcome versus the underlying truth.

In MATLAB, this is very easy to calculate. Let's assume that your categories for your model have been listed from 1

to N

, where N

is the total number of tags you are categorizing with. Let be groundTruth

your label vector denoting basic truth, and predictedLabels

denote your labels generated from your classifier. The accuracy is calculated simply:

accuracy = sum(groundTruth == predictedLabels) / numel(groundTruth);
accuracyPercentage = 100*accuracy;

      

The first line of code calculates how much of your model's precision is fractional. The second line calculates this as a percentage, where you simply multiply the first line of code by 100. You can use either or if you want to estimate the accuracy. One of them normalizes between [0,1]

and the other between 0% and 100%. What it does groundTruth == predictedLabels

is that it compares every element between groundTruth

and predictedLabels

. If the value of i th in groundTruth

matches the value of i th in predictedLabels

, we output a 1

. If not, we print a 0

. This will be a vector of 0s and 1s, so we simply sum all the values ​​equal to 1, which are eloquently enclosed in the operationsum

... We then divide by the total number of points in our test set to get the final precision of the classifier.



In the example of the toy, let's say I had 4 marks, and my vectors groundTruth

and predictedLabels

were as follows:

groundTruth =     [1 2 3 2 3 4 1 1 2 3 3 4 1 2 3];
predictedLabels = [1 2 2 4 4 4 1 2 3 3 4 1 2 3 3];

      

Precision using the above vectors gives us:

>> accuracy

accuracy =

    0.4000

>> accuracyPercentage

accuracyPercentage =

    40

      

This means we have 40% accuracy or 0.40 accuracy. Using this example, the prediction model was able to accurately classify 40% of the test case when you enter each test case through the classifier. This makes sense because between our predicted outputs and true ground, only 40% or 6 outputs match. These are 1, 2, 6, 7, 10 and 15 elements. There are other metrics for calculating accuracy, such as ROC curves , but when calculating machine learning accuracy, this is usually done.

+5


source







All Articles