R-Confusion on what level means in prediction function
I used Excel to calculate the confidence interval for the predicted value, with a 95% confidence interval, so for the calculation t-value
I used a function TINV(5%,6)
that divided 2.5% and 2.5% on each side, where 6 is the degree of freedom.
But in R the prediction function when I give level= 0.95
it gets a different range of intervals, however the giving level=0.975
gives me the same answer as excel.
So it seems like in the forecast when you give level=0.975
then it takes a split of 2.5% and 2.5% on each side
But all these sites, for example , another example , let's say we are looking for a 95% confidence interval giving the formula level = 0.95 (this would mean a 5% split on each side), but this is 90%, the level should be 0.975 for a 95% interval.
What's happening? I am probably confused.
EDIT:
predict(model, data.frame(c= 12.75, p= 6, f=8), level = 0.975, interval = "confidence")
Model here - multiple linear regression
Data:
y <- c(85.10,106.30,50.20,130.60,54.80,30.30,79.40,91.00,135.40,89.30) # Total Sales
c <- c(8.50,12.90,5.20,10.70,3.10,3.50,9.20,9.00,15.10,10.20) # production cost
p <- c(5.10,5.80,2.10,8.40,2.90,1.20,3.70,7.60,7.70,4.50) # Promotion cost
f <- c(4.70,8.80,15.10,12.20,10.60,3.50,9.70,5.90,20.80,7.90) #First year box office
model <- lm(y ~ c + p + f)
Excel:
I marked by yellow Excel color
The problem is that with Excel I get a forecast of 106.72 with the top 119.35 and below 93.36 with =tinv(5%,6)
With R, I get a forecast of 106.72 with an upper at 117.7 and below at 95.65. level=0.95
With level=0.975
I get exact values like Excel.
In Excel:
=tinv(5%,6) = 2.45``Variance = 5.46
106.72 +/- tvalue*variance
: 119.35
93.36
In R:
se.ci <- predi$se.fit # Variance: 4.518
alpha <- qt((1-0.95)/2,6) # Value: -2.45
predi$fit[1] + c(alpha, -alpha) * se.ci # gives me 117.77165 95.65941
As you can see, the tstat value is the same, but the forecast is different.
But when I do this:
alpha <- qt((1-0.975)/2,6) # Value: -2.968
I get the 93.30182 120.12924
same as excel! (By using level=0.975
in predict
, I get the answer hence the confusion)
source to share
In simple linear regression, the prediction interval for y for a given x * is:
where sy is given:
Consider the following example:
df <- faithful
n <- nrow(df)
names(df) <- c("y","x")
mx <- mean(df$x)
sx <- sd(df$x)
mod = lm(y ~ x, data=df)
yhat <- predict(mod)
xnew <- 80
newdata = data.frame(x=xnew)
alpha <- 0.05
(ypred <- predict(mod, newdata, interval="predict", level = 1-alpha))
#### 95% Prediction interval #####
fit lwr upr
1 4.17622 3.196089 5.156351
We can calculate this interval "manually" using the above formula:
SE <- sqrt(sum((df$y-yhat)^2)/(n-2))*sqrt(1+1/n+(xnew-mx)^2/((n-1)*sx^2))
tval <- qt(1-alpha/2,n-2)
c(ypred[1]-tval*SE, ypred[1]+tval*SE)
#### 95% Prediction interval #####
[1] 3.196089 5.156351
source to share