Setting a threshold in the output of a classifier in Python

Question

Setting a threshold in the output of a classifier in Python

Assuming I have a prepared SVM classifier in Python with the "Likelihood = True" flag as:

classifier = SVC(C = 1000000, gamma = 10, probability=True) 
classifier.fit(my_data, the_labels)

When I do the classification of new data, I only want to keep the classified new data that has a probability above a threshold, say 0.90. How can i do this? So far I am doing something like this, but I am stuck:

labels_predicted = classifier.predict(new_data)
probabilities = classifier.predict_proba(new_data)

The first command returns the actual labels, and the second returns the probability of its label. So, for each data point, I have a maximum likelihood label and all associated probabilities belonging to all labels. But the label for the maximum likelihood might be 0.4, which I don't need. How can I only store labels with a specific threshold?

+3

python classification

gelazari Apr 21 15 at 12:55

source to share

1 answer

Sudeep juvekar · Accepted Answer · 2015-04-21T13:12:38+0000

As far as I know, SVC by itself does not allow you to generate probabilities the way you want. You can do a second indexing pass and get the accepted labels after you build labels_predicted

and probabilities

.

thresh = 0.9
accepted_probabilities_idx = probabilities.max(axis=1) > thresh
accepted_labels_predicted = labels_predicted[accepted_probabilities_idx]
accepted_new_data = pandas.DataFrame(new_data, index=accepted_probabilities_idx)

I'm not sure what you want to do with data where ML is low probability. This decision discards it entirely.

Setting a threshold in the output of a classifier in Python

More articles: