Problems understanding sklearn SVM preview_proba function
I'm having trouble understanding a function from sklearn and would like some clarification. At first I thought that sklearn SVM's predict_proba function gives a level of confidence in predicting the classifier, but after playing with it with my emotion recognition program, I start to doubt and feel like I misunderstood the use and how the pred_proba function worked.
For example, I have a code setup something like this:
# Just finished training and now is splitting data (cross validation)
# and will give an accuracy after testing the accuracy of the test data
features_train, features_test, labels_train, labels_test = cross_validation.train_test_split(main, target, test_size = 0.4)
model = SVC(probability=True)
model.fit(features_train, labels_train)
pred = model.predict(features_test)
accuracy = accuracy_score(labels_test, pred)
print accuracy
# Code that records video of 17 frames and forms matrix know as
# sub_main with features that would be fed into SVM
# Few lines of code later. . .
model.predict(sub_main)
prob = model.predict_proba(sub_main)
prob_s = np.around(prob, decimals=5)
prob_s = prob_s* 100
pred = model.predict(sub_main)
print ''
print 'Prediction: '
print pred
print 'Probability: '
print 'Neutral: ', prob_s[0,0]
print 'Smiling: ', prob_s[0,1]
print 'Shocked: ', prob_s[0,2]
print 'Angry: ', prob_s[0,3]
print ''
And when I check it, it gives me something like this:
Prediction:
['Neutral']
Probability:
Neutral: 66.084
Smiling: 17.875
Shocked: 11.883
Angry: 4.157
He had a 66% confidence that the correct classification was "Neutral". 66 was next to Neutral, which turned out to be the highest number. The largest number was marked with actual prediction and I was happy about that.
But in the end.,.
Prediction:
['Angry']
Probability:
Neutral: 99.309
Smiling: 0.16
Shocked: 0.511
Angry: 0.02
He made the prediction "Angry" (which is the correct classification btw) and he assigned a 99.3 percent confidence level next to "Neutral". The highest confidence level (highest number) was assigned to Neutral, despite the fact that the forecast was completely different.
Sometimes it also does this:
Prediction:
['Smiling']
Probability:
Neutral: 0.0
Smiling: 0.011
Shocked: 0.098
Angry: 99.891
Prediction:
['Angry']
Probability:
Neutral: 99.982
Smiling: 0.0
Shocked: 0.016
Angry: 0.001
I don't understand how SVM's predict_proba function works, and I would like to clarify how it works and what happens with my code. What's going on in my code?
source to share
I don't know much about how SVC works, so you may want to consider what the comment says to complete this answer.
You should consider that predic_proba will provide you with the categories in lexicographic order as they appear in the classes_ attribute. You have it in the doc .
When you want to print your result, you should consider this. And we can see in your examples that Angry is in the first index, so your result is good except for the first one.
try this:
print 'Neutral: ', prob_s[0,1]
print 'Smiling: ', prob_s[0,3]
print 'Shocked: ', prob_s[0,2]
print 'Angry: ', prob_s[0,0]
source to share