Why doesn't removing a variable in Anova-containing factors in R reduce the degree of freedom?

I am trying to do ANCOVA (combination between ANOVA and linear regression) between different models and I am having some problems. I think I have narrowed it down to a problem (or something that I don't understand or do wrong) about ANOVA: to compare two models, they must have different residual Df (degrees of freedom).

As an example, consider the mtcars data in R:

library(car)
test_data <- mtcars %>% mutate(factored_variable = as.factor(carb))

model_1 <- aov(drat ~ factored_variable , data = test_data)
Anova(model_1, type = "III")

    # Anova Table (Type III tests)
    # 
    # Response: drat
    # Sum Sq Df  F value                Pr(>F)    
    # (Intercept)       94.870  1 313.3656 0.0000000000000005038 ***
    #   factored_variable  0.991  5   0.6546                0.6607    
    # Residuals          7.871 26                                   
    # ---
    #   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

model_2 <- aov(drat ~ factored_variable - 1, data = test_data)
Anova(model_2, type = "III")

    # Anova Table (Type III tests)
    # 
    # Response: drat
    # Sum Sq Df F value                Pr(>F)    
    # factored_variable 414.92  6  228.42 < 0.00000000000000022 ***
    #   Residuals           7.87 26                                  
    # ---
    #   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

      

So, I just created two models to predict the drat value. First, a variable with a coefficient (Df = number of levels - 1 = 5) and an interception (always Df = 1) are taken, so 6 Df are used. I removed the interception in the second model, so I only have a variable. Then I would expect only 5 Df to be used by this variable, but apparently this is not the case, as Anova says there are 6.

So my question is, why is it the last Df 6 and not 5? I think it has to do with the fact that the variable has factors, but I don't understand why. Is it not possible to compare two models with such a variable?

edit: thanks for the answer. I think I misunderstood the theory, not R, this is a little clearer.

+3


source to share


1 answer


Your two models are essentially the same model, but in the second model, you forced the intercept to be null. Removing the interception does not change the degrees of freedom, since it results in all 6 levels of parameter estimates factored_variable

, and not 6-1 = 5 levels factored_variable

plus interception.

To ensure that the models are otherwise equivalent (and each is equivalent to a regression model), we will create equivalent linear regression models and then look at the coefficients.

aov1 <- aov(drat ~ factored_variable , data = test_data)
aov2 <- aov(drat ~ factored_variable - 1, data = test_data)

lm1 = lm(drat ~ factored_variable , data = test_data)
lm2 = lm(drat ~ factored_variable - 1 , data = test_data)

      

Now let's look at the coefficients for the four models as shown in the code and below. aov1

and lm1

score the interception plus 5 odds for factored_variable

. The coefficient for the missing category factored_variable

(category "links") is interception. Other factors are the differences between this category and the reference category. aov2

and lm2

estimate the absolute rate for each category factored_variable

, rather than the rate related to the reference category.



coefs = data.frame(aov1=coef(aov1), aov2=coef(aov2), lm1=coef(lm1), lm2=coef(lm2))

      

                                         aov1     aov2         lm1      lm2
(Intercept)/factored_variable1     3.68142857 3.681429  3.68142857 3.681429
factored_variable2                 0.01757143 3.699000  0.01757143 3.699000
factored_variable3                -0.61142857 3.070000 -0.61142857 3.070000
factored_variable4                -0.08542857 3.596000 -0.08542857 3.596000
factored_variable6                -0.06142857 3.620000 -0.06142857 3.620000
factored_variable8                -0.14142857 3.540000 -0.14142857 3.540000

      

Note that model pairs lm1

/ aov1

and lm2

/ aov2

have the same coefficients. For models aov1

and lm1

, if you add the odds for each factored_variable

to the interception, you will also see that the odds are the same as the odds for lm2

and aov2

. In each case, the model estimates six parameters.

+2


source







All Articles