"Bin" continuous values in ggplot2 based on criteria to produce sharper colors (eg factor level paint)?
Currently I am just using something like this:
test_data$level <- rep("", nrow(test_data))
test_data[test_data$value <= 1, ]$level <- "1"
test_data[test_data$value > 1 & test_data$value <= 2, ]$level <- "2"
...
test_data[test_data$value > 4 & test_data$value <= 5, ]$level <- "5"
Just wondering if there is a better way to do this in R, or just apply some scale
through argument ggplot2
to do the categorization.
There may be several approaches to this, so it was difficult to accurately formulate my question. Here's the gist ... I have data something like this:
set.seed(123)
test_data <- data.frame(var1 = rep(LETTERS[1:3], each = 5),
var2 = rep(letters[1:5], 3),
value = runif(30, 1, 5))
test_data
var1 value
1 A 2.150310
2 B 4.153221
3 C 2.635908
4 D 4.532070
5 E 4.761869
6 F 1.182226
7 G 3.112422
8 H 4.569676
9 I 3.205740
10 J 2.826459
I have a lot more data points and I am drawing something like this:
library(ggplot2)
p <- ggplot(test_data, aes(x = var1, y = var2, colour = value))
p <- p + geom_jitter(position = position_jitter(width = 0.1, heigh = 0.1))
p
Which gives something like this:
My actual data comes from a 1-5 subjective assessment, but I put together similar questions and averaged them together so they are no longer whole numbers.
I plot scores on a combination of factors to visualize which combinations gave higher scores. The default continuous scale is not "pop", and I would like the color scale to handle the "bins" of these values (0-1, 1-2, ... 4-5) so that they are colored as scale_colour_discrete
for ratios.
So my question (s):
1) Is it possible if ggplot2 is "bin" somehow through scale_colour_continuous
so that I can use the default coloring scheme for the coefficient even if it is continuous data?
2) If not, is there an easier way to create a new vector where I substitute numbers / letters for my values based on criteria? I'm a bit of an R newbie, so I wasn't sure other than heap if()
or conditional statements ( test_data[test_data > 0 & test_data < 1, "values"] <- "a"
or something like that).
source to share
The simplest solution is to do
ggplot(transform(test_data, Discrete=cut(values, seq(0,5,1), include.lowest=T),...
Yours data.frame
will now contain a column of factors based on the column values
, so you can aes(..., color=Discrete,...)
JUST in the context of your ggplot
. The format test_data
will be saved after printing is complete. In the meantime, there is no need to worry about it. ”
To keep a discrete column, of course your best option is:
test_data$Discrete <- cut(values, seq(0,5,1), include.lowest=T)
source to share
You can switch from color bar legend to discrete
-style legend .
library(RColorBrewer) # for brewer.pal
ggplot(test_data, aes(x = var1, y = var2, colour = value)) +
geom_jitter(position = position_jitter(width = 0.1, heigh = 0.1)) +
scale_colour_gradientn(guide = 'legend', colours = brewer.pal(n = 5, name = 'Set1'))
source to share
Literally when I posted an update with my current method, I thought of another way to do it ...
p <- ggplot(test_data, aes(x = var1, y = var2, colour = factor(value)))
p <- p + geom_jitter(position = position_jitter(width = 0.1, height = 0.1))
p <- p + scale_colour_discrete(breaks = 1:5)
p
Stupidly simple; just force continuous values to be treated as individual factor levels and then manipulate the color breaks
ramp through with ggplot2. I see some other answers as well, although I'm not familiar with these methods, so I guess I'll give upvotes to solve the best answer.
source to share