How to calculate confusion matrix for multiclass classification in Scikit?

I have a multi-class classification task. When I run my script based on scikit example like this:

classifier = OneVsRestClassifier(GradientBoostingClassifier(n_estimators=70, max_depth=3, learning_rate=.02))

y_pred = classifier.fit(X_train, y_train).predict(X_test)
cnf_matrix = confusion_matrix(y_test, y_pred)

      

I am getting this error:

File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\metrics\classification.py", line 242, in confusion_matrix
    raise ValueError("%s is not supported" % y_type)
ValueError: multilabel-indicator is not supported

      

I tried to transfer labels=classifier.classes_

to confusion_matrix()

, but it doesn't help.

y_test and y_pred are as follows:

y_test =
array([[0, 0, 0, 1, 0, 0],
   [0, 0, 0, 0, 1, 0],
   [0, 1, 0, 0, 0, 0],
   ..., 
   [0, 0, 0, 0, 0, 1],
   [0, 0, 0, 1, 0, 0],
   [0, 0, 0, 0, 1, 0]])


y_pred = 
array([[0, 0, 0, 0, 0, 0],
   [0, 0, 0, 0, 0, 0],
   [0, 0, 0, 0, 0, 0],
   ..., 
   [0, 0, 0, 0, 0, 1],
   [0, 0, 0, 0, 0, 1],
   [0, 0, 0, 0, 0, 0]])

      

+3


source to share


3 answers


First you need to create an array of labels output. Let's say you have 3 classes: 'cat', 'dog', 'house' indexed: 0,1,2. And the forecast for 2 samples: "dog", "house". Your output would be:

y_pred = [[0, 1, 0],[0, 0, 1]]

      



run y_pred.argmax (1) to get: [1,2] This array denotes the original indexes of the labels, which means: ['dog', 'house']

num_classes = 3

# from lable to categorial
y_prediction = np.array([1,2]) 
y_categorial = np_utils.to_categorical(y_prediction, num_classes)

# from categorial to lable indexing
y_pred = y_categorial.argmax(1)

      

+4


source


This worked for me:

y_test_non_category = [ np.argmax(t) for t in y_test ]
y_predict_non_category = [ np.argmax(t) for t in y_predict ]

from sklearn.metrics import confusion_matrix
conf_mat = confusion_matrix(y_test_non_category, y_predict_non_category)

      



where y_test

and y_predict

are categorical variables like hot vectors.

+1


source


I simply subtracted the output matrix y_test

from the prediction matrix y_pred

while maintaining the categorical format. In case -1

I accepted a false negative and in case I accepted a 1

false positive.

Further:

if output_matrix[i,j] == 1 and predictions_matrix[i,j] == 0:  
    produced_matrix[i,j] = 2 

      

Completion of the following notation:

  • -1: false negative
  • 1: false positive
  • 0: true negative
  • 2: true positive

Finally, by doing some naive calculations, you can produce any metric of confusion.

0


source







All Articles