Typical regression approach, ideally with dplyr

Reading the documentation for do()

dplyr, I was impressed with the ability to create regression models for datasets and wondered if it could be replicated using different explanatory variables rather than datasets.

So far I have tried

require(dplyr)
data(mtcars)

models <- data.frame(var = c("cyl", "hp", "wt"))

models <- models %>% do(mod = lm(mpg ~ as.name(var), data = mtcars))
Error in as.vector(x, "symbol") : 
  cannot coerce type 'closure' to vector of type 'symbol'

models <- models %>% do(mod = lm(substitute(mpg ~ i, as.name(.$var)), data = mtcars))
Error in substitute(mpg ~ i, as.name(.$var)) : 
  invalid environment specified

      

The desired end result would be something like

  var slope standard_error_slope
1 cyl -2.87                 0.32
2  hp -0.07                 0.01
3  wt -5.34                 0.56

      

I know something like this is possible using lapply's approach , but finding an applicable family is pretty much incomprehensible. Is there a dplyr solution?

+3


source to share


2 answers


This is not pure "dplyr", but rather "dplyr" + "tidyr" + "data.table". However, I think it should be easy to read.

library(data.table)
library(dplyr)
library(tidyr)

mtcars %>%
  gather(var, val, cyl:carb) %>%
  as.data.table %>%
  .[, as.list(summary(lm(mpg ~ val))$coefficients[2, 1:2]), by = var]
#      var    Estimate  Std. Error
#  1:  cyl -2.87579014 0.322408883
#  2: disp -0.04121512 0.004711833
#  3:   hp -0.06822828 0.010119304
#  4: drat  7.67823260 1.506705108
#  5:   wt -5.34447157 0.559101045
#  6: qsec  1.41212484 0.559210130
#  7:   vs  7.94047619 1.632370025
#  8:   am  7.24493927 1.764421632
#  9: gear  3.92333333 1.308130699
# 10: carb -2.05571870 0.568545640

      



If you really just wanted multiple variables, start with a vector, not data.frame

.

models <- c("cyl", "hp", "wt")

mtcars %>%
  select_(.dots = c("mpg", models)) %>%
  gather(var, val, -mpg) %>%
  as.data.table %>%
  .[, as.list(summary(lm(mpg ~ val))$coefficients[2, 1:2]), by = var]
#    var    Estimate Std. Error
# 1: cyl -2.87579014  0.3224089
# 2:  hp -0.06822828  0.0101193
# 3:  wt -5.34447157  0.5591010

      

+4


source


Nothing too complicated about the approach on the linked page. The use substitute

and as.name

is a bit of a mystery, but it is easily corrected.

varlist <- names(mtcars)[-1]
models <- lapply(varlist, function(x) {
    form <- formula(paste("mpg ~", x))
    lm(form, data=mtcars)
})

      



dplyr is not all and all of R programming. I would advise you to familiarize yourself with the * apply functions as they will be useful in many situations where dplyr does not work.

+7


source







All Articles