Linear regression in R using lm: different final output in a function

A quick question about linear regression in R using an lm function. I noticed that the output is different when using the summary command as part of the function.

When I enter:

model1 <- lm (PostVal_Ave ~ Int)
summary(model1)

      

The following is returned in the console:

Call:
lm(formula = PostVal_Ave ~ Int)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.9871 -0.8897  0.4853  1.0129  1.5129 

Coefficients:
        Estimate Std. Error t value Pr(>|t|)
(Intercept)   5.4871     0.1426  38.491   <2e-16
Int           0.2776     0.1988   1.396    0.164

(Intercept) ***
Int            
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.322 on 175 degrees of freedom
(35 observations deleted due to missingness)
Multiple R-squared:  0.01102,   Adjusted R-squared:  0.005366 
F-statistic: 1.949 on 1 and 175 DF,  p-value: 0.1644

      

But when writing a function to be able to generate output for multiple models and be able to generate results for multiple dependent variables, I enter:

allModels <- function(x){
    model2 <- lm (x ~ Int)
    model2.1 <- lm (x ~ Int + cPreEff)
    model2.2 <- lm (x ~ Int + cPreEff + Gender + Grade)
    return(c(summary(model1), summary(model1.1), summary(model1.2)))}

      

And I get the same result compared to the output for model 1, but with a lot of additional output for these three models (model2, model2.1 and model2.2). Specifically, the output contains residuals for each case for each of the three models and information about each case with missing data. The advice would be much appreciated. Thank.

+3


source to share


2 answers


Note that it lm()

returns an object of class "lm" and summary()

on this object creates an object "summary.lm". There are custom objects print.lm()

and print.summary.lm()

. Therefore, anything that is ever printed to the console may be different from what is in the object itself.

When you manually concatenate ( c()

) two summary.lm objects, you create one aggregated list and lose the corresponding class. You probably want to return a list of objects



return(list(summary(model1), summary(model1.1), summary(model1.2)))

      

+2


source


An alternative is to use David Robinson's broom package , which converts the mapping output to a dataframe. For a short description of the match (no coefficients) using glance

:

library(broom)

fitIris <- function() {
  fit1 <- glance(lm(Sepal.Length ~ Species, data = iris))
  fit2 <- glance(lm(Petal.Length ~ Species, data = iris))
  fit3 <- glance(lm(Sepal.Width ~ Species, data = iris))
  out <- rbind(fit1, fit2, fit3)
  cbind(Fit = c("fit1", "fit2", "fit3"), out)
}

fitIris()

#    Fit r.squared adj.r.squared     sigma  statistic      p.value df    logLik      AIC      BIC deviance df.residual
# 1 fit1 0.6187057     0.6135181 0.5147894  119.26450 1.669669e-31  3 -111.7260 231.4520 243.4945  38.9562         147
# 2 fit2 0.9413717     0.9405741 0.4303345 1180.16118 2.856777e-91  3  -84.8467 177.6934 189.7359  27.2226         147
# 3 fit3 0.4007828     0.3926302 0.3396877   49.16004 4.492017e-17  3  -49.3663 106.7326 118.7751  16.9620         147

      

For more information on fitting, you should use tidy

, but the number of lines will change and the end of the data.frame should be carefully crafted.



fitIris2 <- function() {
  fit1 <- tidy(lm(Sepal.Length ~ Species, data = iris))
  fit2 <- tidy(lm(Petal.Length ~ Species, data = iris))
  fit3 <- tidy(lm(Sepal.Width  ~ Species, data = iris))
  out <- rbind(fit1, fit2, fit3)
  cbind(Fit = rep(c("fit1", "fit2", "fit3"), each=3), out)
}

fitIris2()

#    Fit              term estimate  std.error statistic       p.value
# 1 fit1       (Intercept)    5.006 0.07280222 68.761639 1.134286e-113
# 2 fit1 Speciesversicolor    0.930 0.10295789  9.032819  8.770194e-16
# 3 fit1  Speciesvirginica    1.582 0.10295789 15.365506  2.214821e-32
# 4 fit2       (Intercept)    1.462 0.06085848 24.022945  9.303052e-53
# 5 fit2 Speciesversicolor    2.798 0.08606689 32.509597  5.254587e-69
# 6 fit2  Speciesvirginica    4.090 0.08606689 47.521176  4.106139e-91
# 7 fit3       (Intercept)    3.428 0.04803910 71.358540 5.707614e-116
# 8 fit3 Speciesversicolor   -0.658 0.06793755 -9.685366  1.832489e-17
# 9 fit3  Speciesvirginica   -0.454 0.06793755 -6.682608  4.538957e-10

      

I have not been a user of this for a very long time, so there might be better ways to combine these dataframes.

+2


source







All Articles