R lm: dynamically create regressions

I have a set of dependent variables y1, y2, ...

, a set of independent variables, x1,x2,...

and a set of controls d1,d2,...

. All of them are inside data.table

, let's call it data


I need to do something in lines

out1 <- lm(y1 ~ x1, data=data)
out2 <- lm(y1 ~ x1 + d1 + d2, data=data)


This is of course not very nice, so I was thinking about writing a list containing all these regressions rather than just repeating it. Something along the lines

myRegressions <- list('out1' = y1 ~ x1, 'out2' = y1 ~ x1 + d1 + d2)
output <- NULL
for (reg in myRegressions)
    output[reg] <- lm(myRegressions[[reg]])


This will of course not work: I cannot create the list since the syntax is not valid outside lm()

. What's a good approach here instead?


source to share

3 answers

Formulas can be specified:

myReg <- list('out1' = "mpg ~ cyl")

lm(formula = myReg[[1]], data = mtcars)

(Intercept)          cyl  
     37.885       -2.876  




Using inline dataframe anscombe

try this:

formulas = list(y1 ~ x1, y2 ~ x2)
lapply(formulas, function(fo) do.call("lm", list(fo, data = quote(anscombe))))




lm(formula = y1 ~ x1, data = anscombe)

(Intercept)           x1  
     3.0001       0.5001  


lm(formula = y2 ~ x2, data = anscombe)

(Intercept)           x2  
      3.001        0.500  


Note that some of the output Call:

is output exactly, which will be useful if there are many components in the output list.



You can use paste0

and as.formula

to create formulas and then just put them in lm (), e. g.

regressors <- c("x1", "x1 + x2", "x1 + x2 + x3")

for (i in 1:length(regressors)) {

  print(as.formula(paste0("y1", "~", regressors[i])))


This gives you the formulas (printable). Just save them in a list and swipe through that list with like

lapply(stored_formulas, function(x) { lm(x, data=yourData) })




All Articles