R-plot "Heat map"

I have a matrix with x rows (i.e. number of draws) and y columns (number of observations). They represent the distribution of the forecasts.

Now I would like to make a heatmap of the draws. That is, I want to plot a "confidence interval" (not really a confidence interval, but only all values โ€‹โ€‹with shading between them), but as a "heat map" (example heat map ). This means that if, for example, many draws for an observation y = y * were around 1, but there was also a draw of 5 for the same observation, then the area of โ€‹โ€‹the confidence interval is around 1 darker (but everything between 1 and 5 is still shaded) ...

To be absolutely clear: I like, for example, the plot in the answer here , but then I would like the gray confidence interval to be colored as intensities instead (i.e. some areas are darker).

Can someone please tell me how I could achieve this?

Thanks in advance.

Edit: As requested: sample data. An example of the first 20 values โ€‹โ€‹of the first column (ie Y [1: 20,1]):

[1]  0.032067416 -0.064797792  0.035022338  0.016347263  0.034373065 
0.024793101 -0.002514447  0.091411355 -0.064263536 -0.026808208 [11]  0.125831185 -0.039428744  0.017156454 -0.061574540 -0.074207109 -0.029171227  0.018906181  0.092816957  0.028899699 -0.004535961

      

+3


source to share


2 answers


So, the tricky part of that is transforming your data into the shape you want, so it's nice to share what really looks like your data, not just one column.

Let's say your data is a matrix with 10,000 rows and 10 columns. I just use uniform distribution, so it will be a boring plot at the end

n = 10000
k = 10
mat = matrix(runif(n * k), nrow = n)

      

Then we'll calculate the quantiles for each column using apply

transpose and make it a data frame:

dat = as.data.frame(t(apply(mat, MARGIN = 2, FUN = quantile, probs = seq(.1, 0.9, 0.1))))

      

Add a variable x

(since we transposed each x value corresponds to a column of the original data)

dat$x = 1:nrow(dat)

      



Now we need to enter it into a "long" form, grouped by the min and max values โ€‹โ€‹for a specific group of deviations around the median, and of course, get rid of the annoying percentage signs entered quantile

:

library(dplyr)
library(tidyr)
dat_long = gather(dat, "quantile", value = "y", -x) %>%
    mutate(quantile = as.numeric(gsub("%", "", quantile)),
           group = abs(50 - quantile))

dat_ribbon = dat_long %>% filter(quantile < 50) %>%
    mutate(ymin = y) %>%
    select(x, ymin, group) %>%
    left_join(
        dat_long %>% filter(quantile > 50) %>%
        mutate(ymax = y) %>%
        select(x, ymax, group)
    )

dat_median = filter(dat_long, quantile == 50)

      

Finally, we can speak. We will plot transparent tape for each "group", that is, 10% -90% range, 20% -80% range, 40% -60% range, and then one line on the median (50%). Using transparency will make the middle darker as it has more ribbons overlapping it. It doesn't go from mininum to max, but if you set probs

in the call quantile

to go from 0 to 1 instead of .1 to .9.

library(ggplot2)
ggplot(dat_ribbon, aes(x = x)) +
    geom_ribbon(aes(ymin = ymin, ymax = ymax, group = group), alpha = 0.2) +
    geom_line(aes(y = y), data = dat_median, color = "white")

      

enter image description here

It's worth noting that this is not a regular heatmap. A heatmap usually assumes that you have 3 variables, x, y and z (color), where there is a z-value for each xy pair. Here you have two variables: x and y, y depending on x.

+3


source


It's not much, but I'm probably starting with a package hexbin

or hexbinplot

. Several alternatives are presented in this SO post.



Formatting and managing graphics from the R "hexbin" package

+1


source







All Articles