How to calculate confusion matrix for multiclass classification in Scikit?
I have a multi-class classification task. When I run my script based on scikit example like this:
classifier = OneVsRestClassifier(GradientBoostingClassifier(n_estimators=70, max_depth=3, learning_rate=.02))
y_pred = classifier.fit(X_train, y_train).predict(X_test)
cnf_matrix = confusion_matrix(y_test, y_pred)
I am getting this error:
File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\metrics\classification.py", line 242, in confusion_matrix
raise ValueError("%s is not supported" % y_type)
ValueError: multilabel-indicator is not supported
I tried to transfer labels=classifier.classes_
to confusion_matrix()
, but it doesn't help.
y_test and y_pred are as follows:
y_test =
array([[0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 1, 0, 0, 0, 0],
...,
[0, 0, 0, 0, 0, 1],
[0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0]])
y_pred =
array([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
...,
[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0]])
source to share
First you need to create an array of labels output. Let's say you have 3 classes: 'cat', 'dog', 'house' indexed: 0,1,2. And the forecast for 2 samples: "dog", "house". Your output would be:
y_pred = [[0, 1, 0],[0, 0, 1]]
run y_pred.argmax (1) to get: [1,2] This array denotes the original indexes of the labels, which means: ['dog', 'house']
num_classes = 3
# from lable to categorial
y_prediction = np.array([1,2])
y_categorial = np_utils.to_categorical(y_prediction, num_classes)
# from categorial to lable indexing
y_pred = y_categorial.argmax(1)
source to share
This worked for me:
y_test_non_category = [ np.argmax(t) for t in y_test ]
y_predict_non_category = [ np.argmax(t) for t in y_predict ]
from sklearn.metrics import confusion_matrix
conf_mat = confusion_matrix(y_test_non_category, y_predict_non_category)
where y_test
and y_predict
are categorical variables like hot vectors.
source to share
I simply subtracted the output matrix y_test
from the prediction matrix y_pred
while maintaining the categorical format. In case -1
I accepted a false negative and in case I accepted a 1
false positive.
Further:
if output_matrix[i,j] == 1 and predictions_matrix[i,j] == 0:
produced_matrix[i,j] = 2
Completion of the following notation:
- -1: false negative
- 1: false positive
- 0: true negative
- 2: true positive
Finally, by doing some naive calculations, you can produce any metric of confusion.
source to share