Draw vertical quantile lines above the bar graph

I am currently generating the following plot using ggplot in R:

Data is stored in a single data frame with three columns: PDF (Y-axis in the graph above), mids (x), and dataset name. It is created from histograms.
I want to plot a color-coded vertical line for each dataset representing the 95th quantile, as I drew by hand as an example below:

I tried to use + geom_line(stat="vline", xintercept="mean")

, but of course I'm looking for quantiles, not mean, and AFAIK ggplot doesn't allow this. The colors are ok.
I also tried it + stat_quantile(quantiles = 0.95)

, but I'm not sure what exactly it does. The documentation is very sparse. The colors are, again, ok.

Note that the density values ​​are very low, down to 1e-8. I don't know if the quantile () function is pleasant.

I understand that calculating the quantile of a histogram is not exactly the same as calculating a list of numbers. I don't know how it might help, but the package HistogramTools

contains a function ApproxQuantile()

for histogram quantiles.

A minimal working example is shown below. As you can see, I get a dataframe from each histogram, then I link the dataframes and plot them.

library(ggplot2)
v <- c(1:30, 2:50, 1:20, 1:5, 1:100, 1, 2, 1, 1:5, 0, 0, 0, 5, 1, 3, 7, 24, 77)
h <- hist(v, breaks=c(0:100))
df1 <- data.frame(h$mids,h$density,rep("dataset1", 100))
colnames(df1) <- c('Bin','Pdf','Dataset')
df2 <- data.frame(h$mids*2,h$density*2,rep("dataset2", 100))
colnames(df2) <- c('Bin','Pdf','Dataset')
df_tot <- rbind(df1, df2)

ggplot(data=df_tot[which(df_tot$Pdf>0),], aes(x=Bin, y=Pdf, group=Dataset, colour=Dataset)) +
geom_point(aes(color=Dataset), alpha = 0.7, size=1.5)

      

+4


source to share


1 answer


Pre-calculating these values ​​and plotting them separately seems to be the easiest option. This dplyr

requires a minimum effort:

library(dplyr)
q.95 <- df_tot %>%
  group_by(Dataset) %>%
  summarise(Bin_q.95 = quantile(Bin, 0.95))

ggplot(data=df_tot[which(df_tot$Pdf>0),], 
       aes(x=Bin, y=Pdf, group=Dataset, colour=Dataset)) +
  geom_point(aes(color=Dataset), alpha = 0.7, size=1.5) + 
  geom_vline(data = q.95, aes(xintercept = Bin_q.95, colour = Dataset))

      



enter image description here

+3


source







All Articles