R does not omit the base level of factorial interaction with numeric if the main effect of the number comes from a function

I know this problem can be worked around by creating precomputed transformations, but I'd really like to use the R formula functionality. Here is a reproducible example of my problem:

Generate (correlated) toy data:



Run the linear model:

> lm(y ~ x + z, test)
lm(formula = y ~ x + z, data = test)

(Intercept)            x           zb           zc  
    0.02453      0.27484     -0.08279     -0.12868


Looks good. The first factor-level "a" is omitted as it should be. Now enable interaction between numeric x and multiplier z:

> lm(y ~ x + z + z:x, test)
lm(formula = y ~ x + z + z:x, data = test)

(Intercept)            x           zb           zc         x:zb         x:zc  
   0.037008     0.262650    -0.134938    -0.118896     0.049068    -0.009225 
        lm(y ~ poly(x,2) + z:x, test)


Everything is still good. Now use the "poly" function to add the quadratic transformation x:

> lm(y ~ poly(x, 2) + z + z:x, test)

lm(formula = y ~ poly(x, 2) + z + z:x, data = test)

(Intercept)  poly(x, 2)1  poly(x, 2)2           zb           zc         za:x         zb:x         zc:x  
    0.33928      1.23017     -0.18029     -0.15478     -0.15574     -0.02749      0.04165           NA  


And here he is. Instead of eliminating the first level z 'a' in terms of interaction, it is included along the other two levels. Now za: x is 'aliased' because the model will of course be the only one with all three factors included. This is bad because functions like "vif" from the "car" package don't work:

> vif(lm(y ~ poly(x,2) + z + z:x, test))
Error in vif.lm(lm(y ~ poly(x, 2) + z + z:x, test)) : 
  there are aliased coefficients in the model


I've tried things like y ~ poly (x, 2) + z + z: poly (x, 1) or y ~ poly (x, 2) + z + release (z, ref = 'a'): x but nothing worked. Is this a bug or can someone explain this result? Is there a way to avoid this problem and use the functionality of the formula the way I intended? Thank.


source to share

1 answer

Since formulas allow you to use any function, there is no way for R to know which functions will return values ​​equal to other values ​​already in the equation. There is no specific coding for poly()


If you just want to include the term x

and x^2

, you can do

lm(formula = y ~ x + I(x^2) + z + z:x, data = test)


avoiding using poly()

everything together. You just have to be more careful when constructing your formula.



All Articles