What's the solution_function_shape for sklearn.svm.SVC when using OneVsRestClassifier?
I am doing a multi-tag classification where I am trying to predict the correct tags for the questions:
(X = questions, y = tag list for each question from X).
I am wondering which decision_function_shape
for sklearn.svm.SVC
should I use with OneVsRestClassifier
?
From the docs, we can read what decision_function_shape
can have two meanings 'ovo'
and 'ovr'
:
decision_function_shape : 'ovo,' ovr or None, default = None
Should the one-vs-rest ('ovr) solve function return of the form (n_samples, n_classes) like all other classifiers, or the original one-vs-one (' ovo) libsvm solution function, which has the form (n_samples, n_classes * (n_classes - 1) / 2). The default None will currently behave like "ovo for backward compatibility and deprecation warning, but will change 'ovr in 0.19.
But I still don't understand what is the difference between:
# First decision_function_shape set to 'ovo'
estim = OneVsRestClassifier(SVC(kernel='linear', decision_function_shape ='ovo'))
# Second decision_function_shape set to 'ovr'
estim = OneVsRestClassifier(SVC(kernel='linear', decision_function_shape ='ovr'))
Which decision_function_shape
one should I use for a problem with multiple labels ?
EDIT: Question asking for a similar thing with no answer.
source to share
I think the question of which one should be used is best left to the situation. This can easily be part of your GridSearch. But only intuitively I would feel that as you disagree, you will do the same. Here's my reasoning:
OneVsRestClassifier
is designed to model each class in relation to all other classes independently and creates a classifier for each situation. The way I understand this process is that it OneVsRestClassifier
grabs the class and creates a binary label for whether point is a class or not. This marking is then fed into whatever grade you choose to use. I believe there is confusion as to what SVC
also allows you to make the same choice, but in reality with this implementation the choice doesn't matter because you will always only feed two classes into SVC
.
And here's an example:
from sklearn.datasets import load_iris
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
data = load_iris()
X, y = data.data, data.target
estim1 = OneVsRestClassifier(SVC(kernel='linear', decision_function_shape='ovo'))
estim1.fit(X,y)
estim2 = OneVsRestClassifier(SVC(kernel='linear', decision_function_shape='ovr'))
estim2.fit(X,y)
print(estim1.coef_ == estim2.coef_)
array([[ True, True, True, True],
[ True, True, True, True],
[ True, True, True, True]], dtype=bool)
So, you can see that the coefficients are equal for all three estimates built by these two models. The provided dataset only contains 150 samples and 3 classes, so it is possible that these results may differ for a more complex dataset, but this is a simple proof of concept.
source to share
The shape of the decision functions is different from what ovo
trains a classifier for each combination of a 2-pair class , whereas ovr
trains one classifier for each class set against all other classes.
The best example I could find can be found here at http://scikit-learn.org :
SVC and NuSVC implement a one-against-one approach (Knerr et al., 1990) for multiclass classification. If
n_class
is the number of classes, then the classifiers aren_class * (n_class - 1) / 2
built and each of them trains data from two classes. Provide a consistent interface with other classifiers, optiondecision_function_shape
allows one-against-one classifier results to be aggregated to a form decision function (n_samples, n_classes)
>>> X = [[0], [1], [2], [3]]
>>> Y = [0, 1, 2, 3]
>>> clf = svm.SVC(decision_function_shape='ovo')
>>> clf.fit(X, Y)
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovo', degree=3, gamma='auto', kernel='rbf',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)
>>> dec = clf.decision_function([[1]])
>>> dec.shape[1] # 4 classes: 4*3/2 = 6
6
>>> clf.decision_function_shape = "ovr"
>>> dec = clf.decision_function([[1]])
>>> dec.shape[1] # 4 classes
4
What does this mean in simple terms?
To understand what this means n_class * (n_class - 1) / 2
, generate combinations of the two classes with itertools.combinations
.
def ovo_classifiers(classes):
import itertools
n_class = len(classes)
n = n_class * (n_class - 1) / 2
combos = itertools.combinations(classes, 2)
return (n, list(combos))
>>> ovo_classifiers(['a', 'b', 'c'])
(3.0, [('a', 'b'), ('a', 'c'), ('b', 'c')])
>>> ovo_classifiers(['a', 'b', 'c', 'd'])
(6.0, [('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd')])
Which score should be used for multi-label classification?
For your situation, you have a multi-tagged question (for example here on StackOverflow). If you know your labels (classes) beforehand, I can suggest OneVsRestClassifier(LinearSVC())
, but you can try DecisionTreeClassifier or RandomForestClassifier (I think):
import pandas as pd
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.svm import SVC, LinearSVC
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import Pipeline
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier
df = pd.DataFrame({
'Tags': [['python', 'pandas'], ['c#', '.net'], ['ruby'],
['python'], ['c#'], ['sklearn', 'python']],
'Questions': ['This is a post about python and pandas is great.',
'This is a c# post and i hate .net',
'What is ruby on rails?', 'who else loves python',
'where to learn c#', 'sklearn is a python package for machine learning']},
columns=['Questions', 'Tags'])
X = df['Questions']
mlb = MultiLabelBinarizer()
y = mlb.fit_transform(df['Tags'].values)
pipeline = Pipeline([
('vect', CountVectorizer(token_pattern='|'.join(mlb.classes_))),
('linear_svc', OneVsRestClassifier(LinearSVC()))
])
pipeline.fit(X, y)
final = pd.DataFrame(pipeline.predict(X), index=X, columns=mlb.classes_)
def predict(text):
return pd.DataFrame(pipeline.predict(text), index=text, columns=mlb.classes_)
test = ['is python better than c#', 'should i learn c#',
'should i learn sklearn or tensorflow',
'ruby or c# i am a dinosaur',
'is .net still relevant']
print(predict(test))
Output:
.net c# pandas python ruby sklearn
is python better than c# 0 1 0 1 0 0
should i learn c# 0 1 0 0 0 0
should i learn sklearn or tensorflow 0 0 0 0 0 1
ruby or c# i am a dinosaur 0 1 0 0 1 0
is .net still relevant 1 0 0 0 0 0
source to share