Using fluent in ggplot2 to fit linear model using errors, data in data

I have this dataframe:

> dat
   x         y        yerr
1 -1 -1.132711 0.001744498
2 -2 -2.119657 0.003889120
3 -3 -3.147378 0.007521881
4 -4 -4.220129 0.012921450
5 -5 -4.586586 0.021335644
6 -6 -5.389198 0.032892630
7 -7 -6.002848 0.048230946

      

And I can plot it with standard error dithering like:

p <- ggplot(dat, aes(x=x, y=y)) + geom_point()
p <- p + geom_errorbar(data=dat, aes(x=x, ymin=y-yerr, ymax=y+yerr), width=0.09)
p + geom_smooth(method = "lm", formula = y ~ x)

      

enter image description here

But I need to use yerr to fit my linear model. Is this possible with ggplot2?

+3


source to share


3 answers


Well, I found a way to answer this question.

Because in any scientific experiment where we collect data, if that experiment is performed correctly, all data values ​​should have an associated error.

In some cases, the variance of the error may be the same at all points, but in many, as is the case with the current case in the original question, this is not true. Therefore, we must use this to vary the deviations of error values ​​for different measurements when fitting the curve to our data.



Thus, we need to attribute the weight to the error values, which, according to statistical analysis methods, are equal to 1 / sqrt (errorValue), so it becomes:

p <- ggplot(dat, aes(x=x, y=y, weight = 1/sqrt(yerr))) + 
    geom_point() + 
    geom_errorbar(aes(ymin=y-yerr, ymax=y+yerr), width=0.09) + 
    geom_smooth(method = "lm", formula = y ~ x)

      

enter image description here

+7


source


For any fit of the model, I would make a setup outside of the construction paradigm we are using. To do this, pass in a value weights

that is inversely proportional to the deviation of the observations. The fitting will then be performed using a weighted least squares procedure.

In your example / situation, ggplot geom_smooth

does the following for you. Whist is arguably easier to use geom_smooth

, the benefits of fitting the model directly outweigh that in the long run. First, you have a fitted model and can run diagnostics for fit, model assumptions, and so on.

Set Weighted Least Squares

mod <- lm(y ~ x, data = dat, weights = 1/sqrt(yerr))

      

Then predict()

from the model in the rangex

newx <- with(dat, data.frame(x = seq(min(x), max(x), length = 50)))
pred <- predict(mod, newx, interval = "confidence", level = 0.95)

      



In the above example, we get a method predict.lm

to create an appropriate confidence interval to use.

Then prepare the data for plotting

pdat <- with(data.frame(pred),
             data.frame(x = newx, y = fit, ymax = upr, ymin = lwr))

      

Then build a graph

require(ggplot2)
p <- ggplot(dat, aes(x = x, y = y)) +
       geom_point() +
       geom_line(data = pdat, colour = "blue") + 
       geom_ribbon(mapping = aes(ymax = ymax, ymin = ymin), data = pdat, 
                   alpha = 0.4, fill = "grey60")
p

      

+6


source


Your question is a little vague. Here are some suggestions to help you get started.

  • ggplot2 just uses a lm

    regression function . To get the values, just do:

     lm(y ~ x, data=dat)
    
          

    this will give you a y-intercept and a gradient.

  • You can turn off standard error stat_smooth

    with an argument se

    :

    .... + geom_smooth(method = "lm", formula = y ~ x, se = FALSE) 
    
          

  • You can add tape across your points / error bars with:

    ##This doesn't look good. 
    .... + geom_ribbon(aes(x=x, ymax =y+yerr, ymin=y-yerr))
    
          

+1


source







All Articles