Using fluent in ggplot2 to fit linear model using errors, data in data
I have this dataframe:
> dat
x y yerr
1 -1 -1.132711 0.001744498
2 -2 -2.119657 0.003889120
3 -3 -3.147378 0.007521881
4 -4 -4.220129 0.012921450
5 -5 -4.586586 0.021335644
6 -6 -5.389198 0.032892630
7 -7 -6.002848 0.048230946
And I can plot it with standard error dithering like:
p <- ggplot(dat, aes(x=x, y=y)) + geom_point()
p <- p + geom_errorbar(data=dat, aes(x=x, ymin=y-yerr, ymax=y+yerr), width=0.09)
p + geom_smooth(method = "lm", formula = y ~ x)
But I need to use yerr to fit my linear model. Is this possible with ggplot2?
source to share
Well, I found a way to answer this question.
Because in any scientific experiment where we collect data, if that experiment is performed correctly, all data values should have an associated error.
In some cases, the variance of the error may be the same at all points, but in many, as is the case with the current case in the original question, this is not true. Therefore, we must use this to vary the deviations of error values for different measurements when fitting the curve to our data.
Thus, we need to attribute the weight to the error values, which, according to statistical analysis methods, are equal to 1 / sqrt (errorValue), so it becomes:
p <- ggplot(dat, aes(x=x, y=y, weight = 1/sqrt(yerr))) +
geom_point() +
geom_errorbar(aes(ymin=y-yerr, ymax=y+yerr), width=0.09) +
geom_smooth(method = "lm", formula = y ~ x)
source to share
For any fit of the model, I would make a setup outside of the construction paradigm we are using. To do this, pass in a value weights
that is inversely proportional to the deviation of the observations. The fitting will then be performed using a weighted least squares procedure.
In your example / situation, ggplot geom_smooth
does the following for you. Whist is arguably easier to use geom_smooth
, the benefits of fitting the model directly outweigh that in the long run. First, you have a fitted model and can run diagnostics for fit, model assumptions, and so on.
Set Weighted Least Squares
mod <- lm(y ~ x, data = dat, weights = 1/sqrt(yerr))
Then predict()
from the model in the rangex
newx <- with(dat, data.frame(x = seq(min(x), max(x), length = 50)))
pred <- predict(mod, newx, interval = "confidence", level = 0.95)
In the above example, we get a method predict.lm
to create an appropriate confidence interval to use.
Then prepare the data for plotting
pdat <- with(data.frame(pred),
data.frame(x = newx, y = fit, ymax = upr, ymin = lwr))
Then build a graph
require(ggplot2)
p <- ggplot(dat, aes(x = x, y = y)) +
geom_point() +
geom_line(data = pdat, colour = "blue") +
geom_ribbon(mapping = aes(ymax = ymax, ymin = ymin), data = pdat,
alpha = 0.4, fill = "grey60")
p
source to share
Your question is a little vague. Here are some suggestions to help you get started.
-
ggplot2 just uses a
lm
regression function . To get the values, just do:lm(y ~ x, data=dat)
this will give you a y-intercept and a gradient.
-
You can turn off standard error
stat_smooth
with an argumentse
:.... + geom_smooth(method = "lm", formula = y ~ x, se = FALSE)
-
You can add tape across your points / error bars with:
##This doesn't look good. .... + geom_ribbon(aes(x=x, ymax =y+yerr, ymin=y-yerr))
source to share