Converting probability vector to target vecor in python?
I am doing logistic regression on aperture set from sklearn, I know math and am trying to implement it. In the last step I get a prediction vector, this prediction vector represents the probability that this data point is in class 1 or class 2 (binary classification).
Now I want to turn this prediction vector into a target vector. Let's say if the probability is greater than 50%, then the corresponding data point will belong to class 1, otherwise class 2. Use 0 to represent class 1, 1 for class 2.
I know there is a version for a loop, just iterating over the entire vector. But when the size gets large, it is very expensive for the loop, so I want to do it more efficiently like a matrix operation, it's faster than doing a matrix operation in a loop.
Any suggestion for a faster method?
source to share
A more general solution for a 2D array that has many vectors with many classes:
import numpy as np a = np.array( [ [.5, .3, .2], [.1, .2, .7], [ 1, 0, 0] ] ) idx = np.argmax(a, axis=-1) a = np.zeros( a.shape ) a[ np.arange(a.shape[0]), idx] = 1 print(a)
Output:
[[1. 0. 0.]
[0. 0. 1.]
[1. 0. 0.]]
source to share
If you want to continue working with confusion matrix etc. And again get the original format of the target variable in scikit: array([1 0... 1])
you can use:
a = clf.predict_proba(X_test)[:,1]
a = np.where(a>0.5, 1, 0)
[:,1]
refers to the second class (in my case: 1), the first class in my case was 0
source to share