What's the solution_function_shape for sklearn.svm.SVC when using OneVsRestClassifier?

I am doing a multi-tag classification where I am trying to predict the correct tags for the questions:

(X = questions, y = tag list for each question from X).

I am wondering which decision_function_shape

for sklearn.svm.SVC

should I use with OneVsRestClassifier

?

From the docs, we can read what decision_function_shape

can have two meanings 'ovo'

and 'ovr'

:

decision_function_shape : 'ovo,' ovr or None, default = None

Should the one-vs-rest ('ovr) solve function return of the form (n_samples, n_classes) like all other classifiers, or the original one-vs-one (' ovo) libsvm solution function, which has the form (n_samples, n_classes * (n_classes - 1) / 2). The default None will currently behave like "ovo for backward compatibility and deprecation warning, but will change 'ovr in 0.19.

But I still don't understand what is the difference between:

# First decision_function_shape set to 'ovo'
estim = OneVsRestClassifier(SVC(kernel='linear', decision_function_shape ='ovo'))

# Second decision_function_shape set to 'ovr'
estim = OneVsRestClassifier(SVC(kernel='linear', decision_function_shape ='ovr'))

      

Which decision_function_shape

one should I use for a problem with multiple labels ?

EDIT: Question asking for a similar thing with no answer.

+3


source to share


2 answers


I think the question of which one should be used is best left to the situation. This can easily be part of your GridSearch. But only intuitively I would feel that as you disagree, you will do the same. Here's my reasoning:

OneVsRestClassifier

is designed to model each class in relation to all other classes independently and creates a classifier for each situation. The way I understand this process is that it OneVsRestClassifier

grabs the class and creates a binary label for whether point is a class or not. This marking is then fed into whatever grade you choose to use. I believe there is confusion as to what SVC

also allows you to make the same choice, but in reality with this implementation the choice doesn't matter because you will always only feed two classes into SVC

.

And here's an example:



from sklearn.datasets import load_iris
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC

data = load_iris()

X, y = data.data, data.target
estim1 = OneVsRestClassifier(SVC(kernel='linear', decision_function_shape='ovo'))
estim1.fit(X,y)

estim2 = OneVsRestClassifier(SVC(kernel='linear', decision_function_shape='ovr'))
estim2.fit(X,y)

print(estim1.coef_ == estim2.coef_)
array([[ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True]], dtype=bool)

      

So, you can see that the coefficients are equal for all three estimates built by these two models. The provided dataset only contains 150 samples and 3 classes, so it is possible that these results may differ for a more complex dataset, but this is a simple proof of concept.

+2


source


The shape of the decision functions is different from what ovo

trains a classifier for each combination of a 2-pair class , whereas ovr

trains one classifier for each class set against all other classes.

The best example I could find can be found here at http://scikit-learn.org :

SVC and NuSVC implement a one-against-one approach (Knerr et al., 1990) for multiclass classification. If n_class

is the number of classes, then the classifiers are n_class * (n_class - 1) / 2

built and each of them trains data from two classes. Provide a consistent interface with other classifiers, option decision_function_shape

allows one-against-one classifier results to be aggregated to a form decision function (n_samples, n_classes)

>>> X = [[0], [1], [2], [3]]
>>> Y = [0, 1, 2, 3]
>>> clf = svm.SVC(decision_function_shape='ovo')
>>> clf.fit(X, Y) 
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovo', degree=3, gamma='auto', kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)
>>> dec = clf.decision_function([[1]])
>>> dec.shape[1] # 4 classes: 4*3/2 = 6
6
>>> clf.decision_function_shape = "ovr"
>>> dec = clf.decision_function([[1]])
>>> dec.shape[1] # 4 classes
4

      

What does this mean in simple terms?

To understand what this means n_class * (n_class - 1) / 2

, generate combinations of the two classes with itertools.combinations

.



def ovo_classifiers(classes):
    import itertools
    n_class = len(classes)
    n = n_class * (n_class - 1) / 2
    combos = itertools.combinations(classes, 2)
    return (n, list(combos))

>>> ovo_classifiers(['a', 'b', 'c'])
(3.0, [('a', 'b'), ('a', 'c'), ('b', 'c')])
>>> ovo_classifiers(['a', 'b', 'c', 'd'])
(6.0, [('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd')])

      

Which score should be used for multi-label classification?

For your situation, you have a multi-tagged question (for example here on StackOverflow). If you know your labels (classes) beforehand, I can suggest OneVsRestClassifier(LinearSVC())

, but you can try DecisionTreeClassifier or RandomForestClassifier (I think):

import pandas as pd
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.svm import SVC, LinearSVC
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import Pipeline
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier

df = pd.DataFrame({
  'Tags': [['python', 'pandas'], ['c#', '.net'], ['ruby'],
           ['python'], ['c#'], ['sklearn', 'python']],
  'Questions': ['This is a post about python and pandas is great.',
           'This is a c# post and i hate .net',
           'What is ruby on rails?', 'who else loves python',
           'where to learn c#', 'sklearn is a python package for machine learning']},
                  columns=['Questions', 'Tags'])

X = df['Questions']
mlb = MultiLabelBinarizer()
y = mlb.fit_transform(df['Tags'].values)

pipeline = Pipeline([
  ('vect', CountVectorizer(token_pattern='|'.join(mlb.classes_))),
  ('linear_svc', OneVsRestClassifier(LinearSVC()))
  ])
pipeline.fit(X, y)

final = pd.DataFrame(pipeline.predict(X), index=X, columns=mlb.classes_)

def predict(text):
  return pd.DataFrame(pipeline.predict(text), index=text, columns=mlb.classes_)

test = ['is python better than c#', 'should i learn c#',
        'should i learn sklearn or tensorflow',
        'ruby or c# i am a dinosaur',
        'is .net still relevant']
print(predict(test))

      

Output:

                                      .net  c#  pandas  python  ruby  sklearn
is python better than c#                 0   1       0       1     0        0
should i learn c#                        0   1       0       0     0        0
should i learn sklearn or tensorflow     0   0       0       0     0        1
ruby or c# i am a dinosaur               0   1       0       0     1        0
is .net still relevant                   1   0       0       0     0        0

      

+2


source







All Articles