Linear regression in scikit learn

I have a question regarding the LinearRegression model in learning scikit

( http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html )

If we run the following code:

from sklearn import linear_model
import pandas as pd
import numpy as np

dates   = pd.date_range('20000101', periods=100)
df      = pd.DataFrame(index=dates)
df['A'] = np.cumsum(np.random.randn(100))  
df['B'] = np.cumsum(np.random.randn(100))
df['C'] = np.cumsum(np.random.randn(100))
df['D'] = np.cumsum(np.random.randn(100))  
df['E'] = np.cumsum(np.random.randn(100))
df['F'] = np.cumsum(np.random.randn(100))

y       = ['A','B','C']
x       = ['D','E','F']


ols     = linear_model.LinearRegression(fit_intercept = True, 
                                        normalize     = True, 
                                        copy_X        = True, 
                                        n_jobs        = 1)

ols.fit(df[x],df[y])

      

What is it doing here?

Are there 3 different OLS regressions? Value,

1) OLS df['A']

withdf[['D','E','F']]

2) OLS df['B']

with df[['D','E','F']]

and

3) OLS df['C']

withdf[['D','E','F']]

Or does it work with one OLS df[['A','B','C']]

with df[['D','E','F']]

(I think this is called SUR? Not sure ...)

+3


source to share


1 answer


I did some tests to figure out this case.

After running the code

ols.coef_
array([[-0.5273036 ,  0.56382854,  0.24751725], # train for 'A'
       [-0.10430077,  0.10671576,  0.18554053],  # train for 'B'
       [ 0.01481826,  0.03811442,  0.75333578]]) # train for 'C'

      

We can see that coef contains 3 arrays and each array has three parameters.

Then we execute



a = linear_model.LinearRegression(fit_intercept = True, 
                                        normalize     = True, 
                                        copy_X        = True, 
                                        n_jobs        = 1)
a.fit(df[x],df['A'])
a.coef_
array([-0.5273036 ,  0.56382854,  0.24751725])

      

which gives us the same coefficient as the first array we got above

a.fit(df[x],df['B'])
a.coef_
array([-0.10430077,  0.10671576,  0.18554053])

      

which gives us the same coefficient as the second array we got above, etc.

So when you call ols.fit(df[x],df[y])

it creates three different linear regressions for your three target outputy

+4


source







All Articles