Setting a threshold in the output of a classifier in Python

Assuming I have a prepared SVM classifier in Python with the "Likelihood = True" flag as:

classifier = SVC(C = 1000000, gamma = 10, probability=True) 
classifier.fit(my_data, the_labels)

      

When I do the classification of new data, I only want to keep the classified new data that has a probability above a threshold, say 0.90. How can i do this? So far I am doing something like this, but I am stuck:

labels_predicted = classifier.predict(new_data)
probabilities = classifier.predict_proba(new_data)

      

The first command returns the actual labels, and the second returns the probability of its label. So, for each data point, I have a maximum likelihood label and all associated probabilities belonging to all labels. But the label for the maximum likelihood might be 0.4, which I don't need. How can I only store labels with a specific threshold?

+3


source to share


1 answer


As far as I know, SVC by itself does not allow you to generate probabilities the way you want. You can do a second indexing pass and get the accepted labels after you build labels_predicted

and probabilities

.

thresh = 0.9
accepted_probabilities_idx = probabilities.max(axis=1) > thresh
accepted_labels_predicted = labels_predicted[accepted_probabilities_idx]
accepted_new_data = pandas.DataFrame(new_data, index=accepted_probabilities_idx)

      



I'm not sure what you want to do with data where ML is low probability. This decision discards it entirely.

+2


source







All Articles