Python Sklearn Logical Regression Model Wrong Setting
For logistic regression, I am trying to reproduce the results from the logistic regression Wikipedia page . So my code looks like this:
import numpy as np
from sklearn.linear_model import LogisticRegression
x = np.array([0.5, 0.75, 1, 1.25, 1.5, 1.75, 1.75, 2, 2.25, 2.5, 2.75, 3, 3.25, 3.5, 4, 4.25, 4.5, 4.75, 5, 5.5])
y = np.array([0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1])
logistic = LogisticRegression()
logistic.fit(x[:, None], y)
But how do you get a summary of the installed model, specifically something like this:
Coefficient Std.Error z-value P-value (Wald)
Intercept −4.0777 1.7610 −2.316 0.0206
Hours 1.5046 0.6287 2.393 0.0167
This is the Wikipedia page for the model. If I try to use the printing coefficients and interceptions, I get something like:
print(logistic.coef_)
print(logistic.intercept_)
[[0.61126347]]
[- 1.36550178]
Which is clearly different.
The question is, why are my results different from the results obtained on the Wikipedia page?
source to share
The wikipedia example does not include model parameter regularization, but sklearn LogisticRegression
uses L2 regularization by default. Set the inverse regularization strength C
to a very high value to avoid using regularization , for example
logistic = LogisticRegression(penalty='l2', C=1e4)
logistic.fit(x[:, None],y)
print(logistic.coef_)
print(logistic.intercept_)
# [[ 1.50459727]]
# [-4.07757136]
source to share
Sklearn is missing a summary report of type R.
For classification tasks, there is a function: sklearn.metrics.classification_report , which calculates several types of (predicted) scores.
For an R-style summary report, look at the statsmodels library .
source to share