# R does not omit the base level of factorial interaction with numeric if the main effect of the number comes from a function

I know this problem can be worked around by creating precomputed transformations, but I'd really like to use the R formula functionality. Here is a reproducible example of my problem:

Generate (correlated) toy data:

```
set.seed(123)
test<-data.frame(x=rnorm(100,1,.5),z=factor(sample(c('a','b','c'),100,T)))
test$y<-.3*test$x+0*(test$z=='a')-.07*(test$z=='b')-.15*(test$z=='c')+rnorm(100,0,.1)
```

Run the linear model:

```
> lm(y ~ x + z, test)
Call:
lm(formula = y ~ x + z, data = test)
Coefficients:
(Intercept) x zb zc
0.02453 0.27484 -0.08279 -0.12868
```

Looks good. The first factor-level "a" is omitted as it should be. Now enable interaction between numeric x and multiplier z:

```
> lm(y ~ x + z + z:x, test)
Call:
lm(formula = y ~ x + z + z:x, data = test)
Coefficients:
(Intercept) x zb zc x:zb x:zc
0.037008 0.262650 -0.134938 -0.118896 0.049068 -0.009225
lm(y ~ poly(x,2) + z:x, test)
```

Everything is still good. Now use the "poly" function to add the quadratic transformation x:

```
> lm(y ~ poly(x, 2) + z + z:x, test)
Call:
lm(formula = y ~ poly(x, 2) + z + z:x, data = test)
Coefficients:
(Intercept) poly(x, 2)1 poly(x, 2)2 zb zc za:x zb:x zc:x
0.33928 1.23017 -0.18029 -0.15478 -0.15574 -0.02749 0.04165 NA
```

And here he is. Instead of eliminating the first level z 'a' in terms of interaction, it is included along the other two levels. Now za: x is 'aliased' because the model will of course be the only one with all three factors included. This is bad because functions like "vif" from the "car" package don't work:

```
> vif(lm(y ~ poly(x,2) + z + z:x, test))
Error in vif.lm(lm(y ~ poly(x, 2) + z + z:x, test)) :
there are aliased coefficients in the model
```

I've tried things like y ~ poly (x, 2) + z + z: poly (x, 1) or y ~ poly (x, 2) + z + release (z, ref = 'a'): x but nothing worked. Is this a bug or can someone explain this result? Is there a way to avoid this problem and use the functionality of the formula the way I intended? Thank.

source to share

Since formulas allow you to use any function, there is no way for R to know which functions will return values equal to other values already in the equation. There is no specific coding for `poly()`

.

If you just want to include the term `x`

and `x^2`

, you can do

```
lm(formula = y ~ x + I(x^2) + z + z:x, data = test)
```

avoiding using `poly()`

everything together. You just have to be more careful when constructing your formula.

source to share