"Bin" continuous values ​​in ggplot2 based on criteria to produce sharper colors (eg factor level paint)?

Currently I am just using something like this:

test_data$level <- rep("", nrow(test_data))
test_data[test_data$value <= 1, ]$level <- "1"
test_data[test_data$value > 1 & test_data$value <= 2, ]$level <- "2"
...
test_data[test_data$value > 4 & test_data$value <= 5, ]$level <- "5"

      

Just wondering if there is a better way to do this in R, or just apply some scale

through argument ggplot2

to do the categorization.


There may be several approaches to this, so it was difficult to accurately formulate my question. Here's the gist ... I have data something like this:

 set.seed(123)
 test_data <- data.frame(var1 = rep(LETTERS[1:3], each = 5),
   var2 = rep(letters[1:5], 3),
   value = runif(30, 1, 5))
 test_data
   var1    value
1     A 2.150310
2     B 4.153221
3     C 2.635908
4     D 4.532070
5     E 4.761869
6     F 1.182226
7     G 3.112422
8     H 4.569676
9     I 3.205740
10    J 2.826459

      

I have a lot more data points and I am drawing something like this:

library(ggplot2)
p <- ggplot(test_data, aes(x = var1, y = var2, colour = value))
p <- p + geom_jitter(position = position_jitter(width = 0.1, heigh = 0.1))
p

      

Which gives something like this:

enter image description here

My actual data comes from a 1-5 subjective assessment, but I put together similar questions and averaged them together so they are no longer whole numbers.

I plot scores on a combination of factors to visualize which combinations gave higher scores. The default continuous scale is not "pop", and I would like the color scale to handle the "bins" of these values ​​(0-1, 1-2, ... 4-5) so that they are colored as scale_colour_discrete

for ratios.

So my question (s):

1) Is it possible if ggplot2 is "bin" somehow through scale_colour_continuous

so that I can use the default coloring scheme for the coefficient even if it is continuous data?

2) If not, is there an easier way to create a new vector where I substitute numbers / letters for my values ​​based on criteria? I'm a bit of an R newbie, so I wasn't sure other than heap if()

or conditional statements ( test_data[test_data > 0 & test_data < 1, "values"] <- "a"

or something like that).

+3


source to share


3 answers


The simplest solution is to do

ggplot(transform(test_data, Discrete=cut(values, seq(0,5,1), include.lowest=T),...

      

Yours data.frame

will now contain a column of factors based on the column values

, so you can aes(..., color=Discrete,...)

JUST in the context of your ggplot

. The format test_data

will be saved after printing is complete. In the meantime, there is no need to worry about it. ”



To keep a discrete column, of course your best option is:

test_data$Discrete <- cut(values, seq(0,5,1), include.lowest=T)

      

+5


source


You can switch from color bar legend to discrete

-style legend .

library(RColorBrewer) # for brewer.pal
ggplot(test_data, aes(x = var1, y = var2, colour = value)) +
   geom_jitter(position = position_jitter(width = 0.1, heigh = 0.1)) + 
    scale_colour_gradientn(guide = 'legend', colours = brewer.pal(n = 5, name = 'Set1'))

      



enter image description here

+2


source


Literally when I posted an update with my current method, I thought of another way to do it ...

p <- ggplot(test_data, aes(x = var1, y = var2, colour = factor(value)))
p <- p + geom_jitter(position = position_jitter(width = 0.1, height = 0.1))
p <- p + scale_colour_discrete(breaks = 1:5)
p

      

Stupidly simple; just force continuous values ​​to be treated as individual factor levels and then manipulate the color breaks

ramp through with ggplot2. I see some other answers as well, although I'm not familiar with these methods, so I guess I'll give upvotes to solve the best answer.

+1


source







All Articles