How to calculate confusion matrix for multiclass classification in Scikit?

Question

How to calculate confusion matrix for multiclass classification in Scikit?

I have a multi-class classification task. When I run my script based on scikit example like this:

classifier = OneVsRestClassifier(GradientBoostingClassifier(n_estimators=70, max_depth=3, learning_rate=.02))

y_pred = classifier.fit(X_train, y_train).predict(X_test)
cnf_matrix = confusion_matrix(y_test, y_pred)

I am getting this error:

File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\metrics\classification.py", line 242, in confusion_matrix
    raise ValueError("%s is not supported" % y_type)
ValueError: multilabel-indicator is not supported

I tried to transfer labels=classifier.classes_

to confusion_matrix()

, but it doesn't help.

y_test and y_pred are as follows:

y_test =
array([[0, 0, 0, 1, 0, 0],
   [0, 0, 0, 0, 1, 0],
   [0, 1, 0, 0, 0, 0],
   ..., 
   [0, 0, 0, 0, 0, 1],
   [0, 0, 0, 1, 0, 0],
   [0, 0, 0, 0, 1, 0]])


y_pred = 
array([[0, 0, 0, 0, 0, 0],
   [0, 0, 0, 0, 0, 0],
   [0, 0, 0, 0, 0, 0],
   ..., 
   [0, 0, 0, 0, 0, 1],
   [0, 0, 0, 0, 0, 1],
   [0, 0, 0, 0, 0, 0]])

+3

python scikit-learn classification confusion-matrix

YNr Apr 27. 17 at 18:07

source to share

3 answers

This worked for me:

y_test_non_category = [ np.argmax(t) for t in y_test ]
y_predict_non_category = [ np.argmax(t) for t in y_predict ]

from sklearn.metrics import confusion_matrix
conf_mat = confusion_matrix(y_test_non_category, y_predict_non_category)

where y_test

and y_predict

are categorical variables like hot vectors.

+1

ak2205 05 jan. 18 at 14:03

source to share

I simply subtracted the output matrix y_test

from the prediction matrix y_pred

while maintaining the categorical format. In case -1

I accepted a false negative and in case I accepted a 1

false positive.

Further:

if output_matrix[i,j] == 1 and predictions_matrix[i,j] == 0:  
    produced_matrix[i,j] = 2

Completion of the following notation:

-1: false negative
1: false positive
0: true negative
2: true positive

Finally, by doing some naive calculations, you can produce any metric of confusion.

0

user1721430 May 23 '18 at 11:49

source to share

Naomi fridman · Accepted Answer · 2017-05-07T09:38:46+0000

First you need to create an array of labels output. Let's say you have 3 classes: 'cat', 'dog', 'house' indexed: 0,1,2. And the forecast for 2 samples: "dog", "house". Your output would be:

y_pred = [[0, 1, 0],[0, 0, 1]]

run y_pred.argmax (1) to get: [1,2] This array denotes the original indexes of the labels, which means: ['dog', 'house']

num_classes = 3

# from lable to categorial
y_prediction = np.array([1,2]) 
y_categorial = np_utils.to_categorical(y_prediction, num_classes)

# from categorial to lable indexing
y_pred = y_categorial.argmax(1)

How to calculate confusion matrix for multiclass classification in Scikit?

More articles: