Linear regression in R using lm: different final output in a function
A quick question about linear regression in R using an lm function. I noticed that the output is different when using the summary command as part of the function.
When I enter:
model1 <- lm (PostVal_Ave ~ Int)
summary(model1)
The following is returned in the console:
Call:
lm(formula = PostVal_Ave ~ Int)
Residuals:
Min 1Q Median 3Q Max
-3.9871 -0.8897 0.4853 1.0129 1.5129
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.4871 0.1426 38.491 <2e-16
Int 0.2776 0.1988 1.396 0.164
(Intercept) ***
Int
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.322 on 175 degrees of freedom
(35 observations deleted due to missingness)
Multiple R-squared: 0.01102, Adjusted R-squared: 0.005366
F-statistic: 1.949 on 1 and 175 DF, p-value: 0.1644
But when writing a function to be able to generate output for multiple models and be able to generate results for multiple dependent variables, I enter:
allModels <- function(x){
model2 <- lm (x ~ Int)
model2.1 <- lm (x ~ Int + cPreEff)
model2.2 <- lm (x ~ Int + cPreEff + Gender + Grade)
return(c(summary(model1), summary(model1.1), summary(model1.2)))}
And I get the same result compared to the output for model 1, but with a lot of additional output for these three models (model2, model2.1 and model2.2). Specifically, the output contains residuals for each case for each of the three models and information about each case with missing data. The advice would be much appreciated. Thank.
source to share
Note that it lm()
returns an object of class "lm" and summary()
on this object creates an object "summary.lm". There are custom objects print.lm()
and print.summary.lm()
. Therefore, anything that is ever printed to the console may be different from what is in the object itself.
When you manually concatenate ( c()
) two summary.lm objects, you create one aggregated list and lose the corresponding class. You probably want to return a list of objects
return(list(summary(model1), summary(model1.1), summary(model1.2)))
source to share
An alternative is to use David Robinson's broom package , which converts the mapping output to a dataframe. For a short description of the match (no coefficients) using glance
:
library(broom)
fitIris <- function() {
fit1 <- glance(lm(Sepal.Length ~ Species, data = iris))
fit2 <- glance(lm(Petal.Length ~ Species, data = iris))
fit3 <- glance(lm(Sepal.Width ~ Species, data = iris))
out <- rbind(fit1, fit2, fit3)
cbind(Fit = c("fit1", "fit2", "fit3"), out)
}
fitIris()
# Fit r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual
# 1 fit1 0.6187057 0.6135181 0.5147894 119.26450 1.669669e-31 3 -111.7260 231.4520 243.4945 38.9562 147
# 2 fit2 0.9413717 0.9405741 0.4303345 1180.16118 2.856777e-91 3 -84.8467 177.6934 189.7359 27.2226 147
# 3 fit3 0.4007828 0.3926302 0.3396877 49.16004 4.492017e-17 3 -49.3663 106.7326 118.7751 16.9620 147
For more information on fitting, you should use tidy
, but the number of lines will change and the end of the data.frame should be carefully crafted.
fitIris2 <- function() {
fit1 <- tidy(lm(Sepal.Length ~ Species, data = iris))
fit2 <- tidy(lm(Petal.Length ~ Species, data = iris))
fit3 <- tidy(lm(Sepal.Width ~ Species, data = iris))
out <- rbind(fit1, fit2, fit3)
cbind(Fit = rep(c("fit1", "fit2", "fit3"), each=3), out)
}
fitIris2()
# Fit term estimate std.error statistic p.value
# 1 fit1 (Intercept) 5.006 0.07280222 68.761639 1.134286e-113
# 2 fit1 Speciesversicolor 0.930 0.10295789 9.032819 8.770194e-16
# 3 fit1 Speciesvirginica 1.582 0.10295789 15.365506 2.214821e-32
# 4 fit2 (Intercept) 1.462 0.06085848 24.022945 9.303052e-53
# 5 fit2 Speciesversicolor 2.798 0.08606689 32.509597 5.254587e-69
# 6 fit2 Speciesvirginica 4.090 0.08606689 47.521176 4.106139e-91
# 7 fit3 (Intercept) 3.428 0.04803910 71.358540 5.707614e-116
# 8 fit3 Speciesversicolor -0.658 0.06793755 -9.685366 1.832489e-17
# 9 fit3 Speciesvirginica -0.454 0.06793755 -6.682608 4.538957e-10
I have not been a user of this for a very long time, so there might be better ways to combine these dataframes.
source to share