How to fix glm hook value

Question

How to fix glm hook value

Here's an example I'm working on:

data2 = data.frame( X = c(0,2,4,6,8,10),
                    Y = c(300,220,210,90,80,10))
attach(data2)
model <- glm(log(Y)~X)
model

Call:  glm(formula = log(Y) ~ X)

Coefficients:
(Intercept)            X  
     6.0968      -0.2984  

Degrees of Freedom: 5 Total (i.e. Null);  4 Residual
Null Deviance:      7.742 
Residual Deviance: 1.509    AIC: 14.74

My question is:

Is there an option in the function glm

that allows me to capture the capture ratios with the value I want? and predict the value of x?

For example: I want my curve to start at the top "Y" value ==> I want to change the intercept with log(300)

+3

r glm intercept coefficients

Cherif Dec 29. 15 at 11:32

source to share

2 answers

data2 <- data.frame( X = c(0,2,4,6,8,10),
                Y = c(300,220,210,90,80,10)
m1 <- lm(log(Y)~X-1+offset(rep(log(300),nrow(data2))),data2)

There is a function predict()

, but it's probably easier to just predict by hand here.

par(las=1,bty="l")
plot(Y~X,data=data2)
curve(300*exp(coef(m1)*x),add=TRUE)

enter image description here

For what it's worth, if you want to compare log-Normal and Poisson models, you can do it via

library("ggplot2")
 theme_set(theme_bw())
 ggplot(data2,aes(X,Y))+geom_point()+
    geom_smooth(method="glm",family=quasipoisson)+
    geom_smooth(method="glm",family=quasi(link="log",var="mu^2"),
      colour="red",fill="red")

enter image description here

+4

Ben bolker Dec 29. 15 at 16:15

source to share

jlhoward · Accepted Answer · 2014-12-29T18:39:48+0000

You are misusing glm(...)

that IMO is a much bigger problem than bias.

The basic assumption in least squares regression is that the error in the response is usually distributed with constant variance. If the error is Y

usually distributed, then log(Y)

most certainly not. Thus, while you can "run numbers" when fitting log(Y)~X

, the results will not make sense. To solve this problem, the theory of generalized linear modeling was developed. Therefore, using glm and not a suitable one log(Y) ~X

, you must match Y~X

with family=poisson

. The first corresponds

log (Y) = b ₀ + b ₁ x

while the latter matches

Y = exp (b ₀ + b ₁ x)

In the latter case, if the error is Y

normally distributed, and if the model is valid, then the residuals will be normally distributed if needed. Note that these two approaches give very different results for b₀ and b ₁...

fit.incorrect <- glm(log(Y)~X,data=data2)
fit.correct   <- glm(Y~X,data=data2,family=poisson)
coef(summary(fit.incorrect))
#               Estimate Std. Error  t value     Pr(>|t|)
# (Intercept)  6.0968294 0.44450740 13.71592 0.0001636875
# X           -0.2984013 0.07340798 -4.06497 0.0152860490
coef(summary(fit.correct))
#               Estimate Std. Error   z value     Pr(>|z|)
# (Intercept)  5.8170223 0.04577816 127.06982 0.000000e+00
# X           -0.2063744 0.01122240 -18.38951 1.594013e-75

In particular, the px ratio is X

almost 30% lower when using the right approach.

Note how the models differ:

plot(Y~X,data2)
curve(exp(coef(fit.incorrect)[1]+x*coef(fit.incorrect)[2]),
      add=T,col="red")
curve(predict(fit.correct,  type="response",newdata=data.frame(X=x)),
      add=T,col="blue")

The result of a good fit (blue curve) passes through the data in a more or less random manner, while the result of a wrong fit greatly overestimates the data for small X

and underestimates the data for large X

. I wonder exactly why you want to "fix" the interception. Looking at the other answer, you can see that when you commit Y₀= 300, fit is underestimated throughout.

In contrast, let's see what happens when we fix Y ₀using glm correctly.

data2$b0 <- log(300)   # add the offset as a separate column
# b0 not fixed
fit <- glm(Y~X,data2,family=poisson)
plot(Y~X,data2)
curve(predict(fit,type="response",newdata=data.frame(X=x)),
      add=TRUE,col="blue")
# b0 fixed so that Y0 = 300
fit.fixed <-glm(Y~X-1+offset(b0), data2,family=poisson)
curve(predict(fit.fixed,type="response",newdata=data.frame(X=x,b0=log(300))),
      add=TRUE,col="green")

Here, the blue curve is an unconditional match (runs correctly) and the green curve is the constraint that bounds Y ₀= 300. You can see that they are not very different from each other, because the correct (no constraints) fit is already quite good.

How to fix glm hook value

More articles: