Ggplot2 plots more points than in dataframe, geom_point + facet_grid
I have some data and I am trying to make boxplots with shaky glasses overlay. My problem is with points, so we'll stick with that.
Here's the data:
> dput(test)
structure(list(var1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L), .Label = c("A", "B", "C", "D",
"E", "F", "G", "H", "I"), class = "factor"), var2 = structure(c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L,
4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L), .Label = c("V1",
"V2", "V3", "V4", "V5", "V6", "V7"), class = "factor"), response1 = c(5L,
6L, 5L, 5L, 5L, 5L, 4L, 6L, 6L, 5L, 5L, 6L, 6L, 4L, 1L, 1L, NA,
1L, NA, NA, 1L, 1L, 1L, NA, 1L, NA, NA, 1L, 5L, 5L, 4L, 5L, 3L,
2L, 3L, 1L, 1L, NA, 1L, NA, NA, 1L, NA, NA, 2L, NA, 3L, 1L, NA,
NA, NA, 4L, NA, 4L, 5L, NA, NA, NA, 1L, NA, 1L, 1L, NA), response2 = c(2L,
2L, 2L, 2L, 2L, 2L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 5L, 5L, NA,
5L, NA, NA, 5L, 5L, 5L, NA, 5L, NA, NA, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, NA, 5L, NA, NA, 5L, NA, NA, 5L, NA, 5L, 5L, NA,
NA, NA, 5L, NA, 5L, 5L, NA, NA, NA, 5L, NA, 5L, 5L, NA), response3 = c(4L,
5L, 1L, 1L, 4L, 1L, 1L, 4L, 5L, 1L, 1L, 5L, NA, 1L, 4L, NA, NA,
NA, 3L, 2L, NA, 4L, NA, NA, NA, 3L, NA, NA, 4L, NA, 1L, NA, 3L,
NA, 2L, 4L, NA, NA, NA, NA, NA, NA, NA, 2L, 1L, 1L, NA, NA, 1L,
NA, 3L, 1L, NA, NA, NA, 1L, NA, 3L, 1L, NA, NA, NA, 1L)), .Names = c("var1",
"var2", "response1", "response2", "response3"), class = "data.frame", row.names = c(NA,
-63L))
I used the reshape2
plot command to melt my data for cut / simulation:
library(reshape2)
test_melted <- melt(test, id.var = c("var1", "var2"), na.rm = T)
And here was the plot I created:
library(ggplot2)
p <- ggplot(test_melted, aes(x = var1, y = value)) + geom_point()
p <- p + facet_grid(~variable) + coord_flip()
p <- p + geom_jitter(position = position_jitter(width=0.2, height = 0.2))
p
This gives the following:
Looked okay enough, but then I noticed that there seemed to be more points per torch / factor level than it should be. I narrowed it down to one level forvar1
test_subset <- test_melted[test_melted$var1 == "E", ]
nrow(test_subset)
[1] 18
summary(test_subset)
var1 var2 variable value
E :18 V1:3 response1:7 Min. :1
A : 0 V2:2 response2:7 1st Qu.:3
B : 0 V3:3 response3:4 Median :5
C : 0 V4:2 Mean :4
D : 0 V5:3 3rd Qu.:5
F : 0 V6:2 Max. :5
(Other): 0 V7:3
So, we should have built 18 common points (7 for response1
, 7 for response2
and 4 for response3
. Let's try:
p <- ggplot(test_subset, aes(x = var1, y = value)) + geom_point()
p <- p + facet_grid(~variable) + coord_flip()
p <- p + geom_jitter(position = position_jitter(width=0.2, height = 0.2))
p
I am counting 11 points in a veil response1
, 8 in response2
and 8 in response3
.
It must be something stupid that I am missing. I've done a lot of dot cuts and it just never happened (or never noticed!).
Things i have tried
- Deleting
coord_flip()
-
test_subset <- droplevels(test_subset)
if empty factor levels were messing with something - Playing with
facet_grid(~variable)
vs.facet_grid(.~variable)
vs.facet_grid(variable~)
vs.facet_grid(variable~.)
As a final note, I get different numbers of points depending on whether it's a face or not. Cut I get 11 + 8 + 8 = 27
, if I remove facet_grid(~variable)
I get 23.
Thanks for any suggestions!
source to share
The problem is not related to the cut, it has to do with the fact that two geomes are used in your plot. Thus, it geom_point
will draw your dots in one place and then geom_jitter
draw them again, in random positions. This is why you can see another point in every plot.
If you remove the call geom_point
, everything is back to normal:
p <- ggplot(test_subset, aes(x = var1, y = value))
p <- p + facet_grid(~variable) + coord_flip()
p <- p + geom_jitter(position = position_jitter(width=0.2, height = 0.2))
p
source to share