How to get RMSE from lm result?
I know there is a slight difference between $sigma
and the concept of mean squared error . So I'm wondering what is the easiest way to get the RMSE from a function lm
in R ?
res<-lm(randomData$price ~randomData$carat+
randomData$cut+randomData$color+
randomData$clarity+randomData$depth+
randomData$table+randomData$x+
randomData$y+randomData$z)
length(coefficients(res))
contains 24 coefficients and I can no longer make my model manually. So how can I estimate the RMSE based on the ratios obtained from lm
?
source to share
Residual sum of squares:
RSS <- c(crossprod(res$residuals))
Mean square error:
MSE <- RSS / length(res$residuals)
Root MSE:
RMSE <- sqrt(MSE)
Calculated residual Pearson variance (according to summary.lm
):
sig2 <- RSS / res$df.residual
Statistically, MSE is an estimator of the maximum probability of residual variance, but biased (downward). Pearson is a limited residual variance maximum likelihood estimate that is unbiased.
Comment
- Given two vectors
x
andy
,c(crossprod(x, y))
equivalentsum(x * y)
, but much faster .c(crossprod(x))
also faster thansum(x ^ 2)
. sum(x) / length(x)
also faster thanmean(x)
.
source to share
I think other answers may be wrong. The MSE of a regression is the SSE divided by (n - k - 1), where n is the number of data points and k is the number of model parameters.
Simply taking the mean square of the residuals (as other answers suggested) is equivalent to dividing by n instead of (n - k - 1).
I would calculate the RMSE by sqrt(sum(res$residuals^2) / res$df)
.
The number in the denominator res$df
gives you a degree of freedom, which is (n - k - 1). Take a look at this for reference: https://www3.nd.edu/~rwilliam/stats2/l02.pdf
source to share