Generating a heatmap using R or Python

So my problem doesn't fit SO. But I'm looking for a solution (in R, Python mostly prefers R) for generating heatmaps for data that has two extreme ends. Consider the following data.

+----+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+
| â€Ļ  |     X1      |     X2      |     X3      |     X4      |     X5      |     X6      |     X7      |     X8      |     X9      |     X10     |     X11     |     X12     |
+----+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+
|  1 | 0.960023745 | 0.006412462 | 0.002413886 | 1.75E-06    | 1.33E-07    | 6.53E-07    | 0.000789362 | 1.56E-07    | 0.027248026 | 2.54E-05    | 0.000108822 | 0.002949816 |
|  2 | 0.013783554 | 0.960582857 | 0.010711838 | 0.003933983 | 0.002573642 | 0.001472307 | 0.000319789 | 0.000195265 | 1.87E-05    | 1.29E-06    | 0.004194081 | 0.002209041 |
|  3 | 0.000839561 | 0.005466858 | 0.944159921 | 0.023892784 | 0.001752099 | 0.000828122 | 0.000493376 | 1.84E-06    | 0.011739846 | 0.000879784 | 9.53E-05    | 0.00980562  |
|  4 | 2.26E-08    | 0.004108291 | 0.010781282 | 0.966410413 | 0.010459999 | 3.04E-05    | 1.64E-06    | 0.001983494 | 0           | 0.000225223 | 0.002846474 | 0.0031448   |
|  5 | 0           | 0.003175902 | 0.002023363 | 0.010022482 | 0.919020424 | 0.032083951 | 0.001814906 | 0.030203657 | 2.02E-06    | 7.07E-05    | 0.001165208 | 0.000413012 |
|  6 | 7.34E-08    | 0.002817014 | 0.000931738 | 7.01E-05    | 0.026999736 | 0.947850807 | 0.003017895 | 0.017994113 | 0           | 0.00011791  | 0.000194055 | 0           |
|  7 | 0.001857195 | 0.000220267 | 0.001523402 | 1.23E-05    | 0.001915852 | 0.010193007 | 0.960227998 | 0.012040256 | 0.007093175 | 0.001441301 | 0.002149965 | 0.001306157 |
|  8 | 0           | 0.000337953 | 0           | 0.00536237  | 0.030409165 | 0.01670267  | 0.009929247 | 0.936720524 | 0           | 0           | 0.000503316 | 3.12E-05    |
|  9 | 0.00350741  | 2.38E-06    | 0.002294787 | 1.17E-06    | 9.38E-08    | 8.74E-08    | 0.000252812 | 4.25E-10    | 0.984092182 | 0.003173648 | 2.42E-05    | 0.006649569 |
| 10 | 0.000126558 | 4.85E-05    | 0.001686418 | 0.000202837 | 3.87E-05    | 9.82E-05    | 0.000425687 | 0           | 0.013116146 | 0.983428814 | 5.28E-05    | 0.000776452 |
| 11 | 0.000170592 | 0.002728779 | 0.000117028 | 0.002794149 | 0.000621607 | 0.000224662 | 0.000969203 | 0.000299963 | 0.000629235 | 4.68E-05    | 0.991344498 | 5.02E-05    |
| 12 | 0.004371355 | 0.001246307 | 0.02523568  | 0.007498292 | 0.000186287 | 6.00E-07    | 0.000956249 | 2.93E-05    | 0.0590514   | 0.001253133 | 8.40E-05    | 0.900059314 |
+----+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+

      

Consider the first line. The entry in column X1 is a very large number compared to the rest of this row. This applies to all lines. The heat map generated by this data looks like this:

enter image description here

As you can see, the diagonal is very strong compared to other colors (and this is evident from the data and is actually expected). I'm just trying to find a way to "darken" other colors. I am mainly looking for a ggplot solution. Everything I've tried dosnt work.

Now the code for R

heatmap(data.matrix(result_matrix), Rowv=NA, Colv=NA, col = rev(heat.colors(256)), margins=c(5,10))

      

+3


source to share


1 answer


The basic idea is to put the fill colors on a logarithmic scale. Here is ggplot's solution.

library(ggplot2)
library(reshape2)
df$id <- rownames(df)
gg <- melt(df,id="id")
ggplot(gg, aes(x=variable,y=id,fill=value))+
  geom_tile()+
  scale_fill_gradientn(colours=rev(heat.colors(10)),
                       trans="log10",na.value="white")+
  coord_fixed()+
  scale_x_discrete(expand=c(0,0))+scale_y_discrete(expand=c(0,0))

      

The key here trans="log10"

is the challenge scale_fill_gradientn(...)

. One problem with logs is that you have zeros in your data that is being converted to NA

. The usage na.value="white"

is related to this (you can make it a different color if appropriate in your use case).

Challenge scale_x...

and scale_y...

is only to tighten the axle to the entire area was covered with tiles (ggplot adds a bit of empty space by default, which diverts heat maps).

EDIT : Answer from OP's comment.



This business of "making the diagonal pop up bigger" is an aesthetic choice that has almost nothing to do with data and is likely to lead to misleading graphics. I don't recommend this. Having said that, you can always choose a different transformation.

# reorder the y-axis  - should not be necessary
gg$id <- factor(gg$id,levels=unique(gg$id))  # should not be necessary...

# square root scale
ggplot(gg, aes(x=variable,y=id,fill=value))+
  geom_tile()+
  scale_fill_gradientn(colours=rev(heat.colors(10)),
                       trans="sqrt",na.value="white")+
  coord_fixed()+
  scale_x_discrete(expand=c(0,0))+scale_y_discrete(expand=c(0,0))

      

#logit scale; need to set breaks=... to avoid labels overlapping
ggplot(gg, aes(x=variable,y=id,fill=value))+
  geom_tile()+
  scale_fill_gradientn(colours=rev(heat.colors(10)),
                       trans="logit",na.value="white",breaks=5*10^-(0:8))+
  coord_fixed()+
  scale_x_discrete(expand=c(0,0))+scale_y_discrete(expand=c(0,0))

      

+2


source







All Articles