Using pandas.ols on multiple dependent variables at the same time

I am wondering if I can apply a model pandas.ols

to a dataframe of multiple response variables against one independent variable at a time.

So, imagine I have the following:

In [109]: y=pandas.DataFrame(np.random.randn(10,4))
In [110]: x=pandas.DataFrame(np.random.randn(10,1))

      

I would like to do something like this:

In [111]: model=pandas.ols(y=y, x=x)

      

Basically, the result of the four outputs of the model, or at least access to the coefficients of four. I would rather iterate over the response variables, if possible.

+3


source to share


2 answers


I think this should do it.



#First generate the data
x=pd.DataFrame(np.random.randn(10,1))
y=pd.DataFrame(np.random.randn(10,4))

#Since we are doing things manually we'll need to add the constant term to the x matrix
x[1] = ones(10)

#This matrix precomputes (X'X)^-1X which we will premultiply the y matrix by to get results
tmpmat =  np.dot(np.linalg.pinv(np.dot(x.T ,x)),x.T)

#Solve for the betas
betamatrix = np.dot(tmpmat,y)

#Compare with the pandas output one at a time.
model=pd.ols(y=y[0], x=x, intercept=False)
model=pd.ols(y=y[1], x=x, intercept=False)

      

+1


source


Have done this many times and have not found an alternative to the cycle. The following code will save the results of the four regressions to a dict. If you are only interested in some of the coefficients, you can grab them while running the regressions.



model = {}
for i in y:
    model[i] = pd.ols(y=y[i], x=x)

      

0


source







All Articles