Use poly () in R formula to predict

I have a question about formula and custom function:

Case 1:

 clotting <- data.frame(
     u = c(5,10,15,20,30,40,60,80,100),
     lot1 = c(118,58,42,35,27,25,21,19,18),
     lot2 = c(69,35,26,21,18,16,13,12,12))

 g1 = glm(lot1 ~ log(u) + poly(u,1), data = clotting, family = Gamma)
 dc = clotting
 dc$u = 1
 predict(g1, dc)

      1           2           3           4           5           6           7           8           9
 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929

      

However, if I just simply move poly as a custom function (in reality I will have my own more complex function), then I get the error:

Case 2:

 xpoly <- function(x, degree=1){poly(x,degree)}
 g2 = glm(lot1 ~ log(u) + xpoly(u,1), data = clotting, family = Gamma)
 predict(g2, dc)
       Error in poly(x, degree) :
      'degree' must be less than number of unique points

      

The prediction seems to be processing a user-defined function in the formula using I (). My question is, how can I get the results for Case2 in the same way as case1?

Can anyone have any idea about this?

+3


source to share


1 answer


poly

here's a slightly unique feature. By default, it returns a set of orthogonal polynomials, so it does the centering and scaling of the data. If you want to predict the use of coefficients from the established model, you will need to transform the new data in the same way as you did with the original data. This means that some additional data must be transferred together.

I'll point out first that if you're using raw, non-orthogonal values, you won't run into this problem.

g1 <- glm(lot1 ~ log(u) + poly(u,1, raw=T), data = clotting, family = Gamma)
xpoly<-function(x,degree=1){poly(x,degree, raw=T)}
g2 <- glm(lot1 ~ log(u) + xpoly(u,1), data = clotting, family = Gamma)

dc=clotting
dc$u=1
predict(g1,dc)
#       1           2           3           4           5           6           7           8           9 
#-0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 
predict(g2,dc)
#       1           2           3           4           5           6           7           8           9 
#-0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929

      

But let's look at how the poly

scaling information is passed on predict

. The work really happens in the function model.frame

. Compare these two results.

attr(terms(model.frame(lot1 ~ log(u) + poly(u,1), clotting)), "predvar")
# list(lot1, log(u), poly(u, 1, coefs = list(alpha = 40, norm2 = c(1, 
9, 8850))))
attr(terms(model.frame(lot1 ~ log(u) + xpoly(u,1), clotting)), "predvar")
# list(lot1, log(u), xpoly(u, 1))

      

You can see that the call poly()

in the first formula has been adjusted in the attribute of the predvar

returned formula. This is done in codemodel.frame

...
if (is.null(attr(formula, "predvars"))) {
    for (i in seq_along(varnames)) predvars[[i + 1L]] <- makepredictcall(variables[[i]], 
        vars[[i + 1L]])
    attr(formula, "predvars") <- predvars
}
...

      

Note that it calls a function makepredictcall()

, which is a generic function that dispatches based on the class of the returned object. It happens that it poly

returns an object of class "poly"



class(poly(1:5, 1))
# [1] "poly"   "matrix"

      

So what is called for "poly" data is the function

stats:::makepredictcall.poly
function (var, call) 
{
    if (as.character(call)[1L] != "poly") 
        return(call)
    call$coefs <- attr(var, "coefs")
    call
}
<bytecode: 0x123262178>
<environment: namespace:stats>

      

Attributes are added here coef=

. But also note that it verifies that the call was caused by poly itself. Because your function is called "xpoly" but returns a "poly" object, no information about the ratio is returned. One way would be to change the return class of your object and create your own function makepredictcall

. For example, you could do

xpoly <- function(...){p<-poly(...); class(p)[1]<-"xpoly"; p}
makepredictcall.xpoly <- function(var, call) {
    call$coefs <- attr(var, "coefs")
    call
}

      

Note that this new version xpoly

also takes an argument coef=

and passes it to before poly()

using parameters ...

. Then you can run

g1 <- glm(lot1 ~ log(u) + poly(u,1), data = clotting, family = Gamma)
g2 <- glm(lot1 ~ log(u) + xpoly(u,1), data = clotting, family = Gamma)
predict(g1,dc)
#          1           2           3           4           5           6           7           8           9 
#-0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929
predict(g2,dc)
#          1           2           3           4           5           6           7           8           9 
#-0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929

      

+4


source







All Articles