Typical regression approach, ideally with dplyr
Reading the documentation for do()
dplyr, I was impressed with the ability to create regression models for datasets and wondered if it could be replicated using different explanatory variables rather than datasets.
So far I have tried
require(dplyr)
data(mtcars)
models <- data.frame(var = c("cyl", "hp", "wt"))
models <- models %>% do(mod = lm(mpg ~ as.name(var), data = mtcars))
Error in as.vector(x, "symbol") :
cannot coerce type 'closure' to vector of type 'symbol'
models <- models %>% do(mod = lm(substitute(mpg ~ i, as.name(.$var)), data = mtcars))
Error in substitute(mpg ~ i, as.name(.$var)) :
invalid environment specified
The desired end result would be something like
var slope standard_error_slope
1 cyl -2.87 0.32
2 hp -0.07 0.01
3 wt -5.34 0.56
I know something like this is possible using lapply's approach , but finding an applicable family is pretty much incomprehensible. Is there a dplyr solution?
source to share
This is not pure "dplyr", but rather "dplyr" + "tidyr" + "data.table". However, I think it should be easy to read.
library(data.table)
library(dplyr)
library(tidyr)
mtcars %>%
gather(var, val, cyl:carb) %>%
as.data.table %>%
.[, as.list(summary(lm(mpg ~ val))$coefficients[2, 1:2]), by = var]
# var Estimate Std. Error
# 1: cyl -2.87579014 0.322408883
# 2: disp -0.04121512 0.004711833
# 3: hp -0.06822828 0.010119304
# 4: drat 7.67823260 1.506705108
# 5: wt -5.34447157 0.559101045
# 6: qsec 1.41212484 0.559210130
# 7: vs 7.94047619 1.632370025
# 8: am 7.24493927 1.764421632
# 9: gear 3.92333333 1.308130699
# 10: carb -2.05571870 0.568545640
If you really just wanted multiple variables, start with a vector, not data.frame
.
models <- c("cyl", "hp", "wt")
mtcars %>%
select_(.dots = c("mpg", models)) %>%
gather(var, val, -mpg) %>%
as.data.table %>%
.[, as.list(summary(lm(mpg ~ val))$coefficients[2, 1:2]), by = var]
# var Estimate Std. Error
# 1: cyl -2.87579014 0.3224089
# 2: hp -0.06822828 0.0101193
# 3: wt -5.34447157 0.5591010
source to share
Nothing too complicated about the approach on the linked page. The use substitute
and as.name
is a bit of a mystery, but it is easily corrected.
varlist <- names(mtcars)[-1]
models <- lapply(varlist, function(x) {
form <- formula(paste("mpg ~", x))
lm(form, data=mtcars)
})
dplyr is not all and all of R programming. I would advise you to familiarize yourself with the * apply functions as they will be useful in many situations where dplyr does not work.
source to share