Using pandas.ols on multiple dependent variables at the same time
I am wondering if I can apply a model pandas.ols
to a dataframe of multiple response variables against one independent variable at a time.
So, imagine I have the following:
In [109]: y=pandas.DataFrame(np.random.randn(10,4))
In [110]: x=pandas.DataFrame(np.random.randn(10,1))
I would like to do something like this:
In [111]: model=pandas.ols(y=y, x=x)
Basically, the result of the four outputs of the model, or at least access to the coefficients of four. I would rather iterate over the response variables, if possible.
+3
source to share
2 answers
I think this should do it.
#First generate the data
x=pd.DataFrame(np.random.randn(10,1))
y=pd.DataFrame(np.random.randn(10,4))
#Since we are doing things manually we'll need to add the constant term to the x matrix
x[1] = ones(10)
#This matrix precomputes (X'X)^-1X which we will premultiply the y matrix by to get results
tmpmat = np.dot(np.linalg.pinv(np.dot(x.T ,x)),x.T)
#Solve for the betas
betamatrix = np.dot(tmpmat,y)
#Compare with the pandas output one at a time.
model=pd.ols(y=y[0], x=x, intercept=False)
model=pd.ols(y=y[1], x=x, intercept=False)
+1
source to share
Have done this many times and have not found an alternative to the cycle. The following code will save the results of the four regressions to a dict. If you are only interested in some of the coefficients, you can grab them while running the regressions.
model = {}
for i in y:
model[i] = pd.ols(y=y[i], x=x)
0
source to share