Python Sklearn Logical Regression Model Wrong Setting

For logistic regression, I am trying to reproduce the results from the logistic regression Wikipedia page . So my code looks like this:

import numpy as np
from sklearn.linear_model import LogisticRegression

x = np.array([0.5, 0.75, 1, 1.25, 1.5, 1.75, 1.75, 2, 2.25, 2.5, 2.75, 3, 3.25, 3.5, 4, 4.25, 4.5, 4.75, 5, 5.5])
y = np.array([0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1])

logistic = LogisticRegression()
logistic.fit(x[:, None], y)

      

But how do you get a summary of the installed model, specifically something like this:

            Coefficient  Std.Error  z-value  P-value (Wald)
Intercept   −4.0777      1.7610     −2.316    0.0206
Hours        1.5046      0.6287      2.393    0.0167

      

This is the Wikipedia page for the model. If I try to use the printing coefficients and interceptions, I get something like:

print(logistic.coef_)
print(logistic.intercept_)

      

[[0.61126347]]

[- 1.36550178]

Which is clearly different.

The question is, why are my results different from the results obtained on the Wikipedia page?

+3


source to share


2 answers


The wikipedia example does not include model parameter regularization, but sklearn LogisticRegression

uses L2 regularization by default. Set the inverse regularization strength C

to a very high value to avoid using regularization
, for example



logistic = LogisticRegression(penalty='l2', C=1e4)
logistic.fit(x[:, None],y)

print(logistic.coef_)
print(logistic.intercept_)

# [[ 1.50459727]]
# [-4.07757136]

      

+4


source


Sklearn is missing a summary report of type R.

For classification tasks, there is a function: sklearn.metrics.classification_report , which calculates several types of (predicted) scores.



For an R-style summary report, look at the statsmodels library .

+3


source







All Articles