Ggplot2 plots more points than in dataframe, geom_point + facet_grid

I have some data and I am trying to make boxplots with shaky glasses overlay. My problem is with points, so we'll stick with that.

Here's the data:

> dput(test)
structure(list(var1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 
8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L), .Label = c("A", "B", "C", "D", 
"E", "F", "G", "H", "I"), class = "factor"), var2 = structure(c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 
4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 
6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 
1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L), .Label = c("V1", 
"V2", "V3", "V4", "V5", "V6", "V7"), class = "factor"), response1 = c(5L, 
6L, 5L, 5L, 5L, 5L, 4L, 6L, 6L, 5L, 5L, 6L, 6L, 4L, 1L, 1L, NA, 
1L, NA, NA, 1L, 1L, 1L, NA, 1L, NA, NA, 1L, 5L, 5L, 4L, 5L, 3L, 
2L, 3L, 1L, 1L, NA, 1L, NA, NA, 1L, NA, NA, 2L, NA, 3L, 1L, NA, 
NA, NA, 4L, NA, 4L, 5L, NA, NA, NA, 1L, NA, 1L, 1L, NA), response2 = c(2L, 
2L, 2L, 2L, 2L, 2L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 5L, 5L, NA, 
5L, NA, NA, 5L, 5L, 5L, NA, 5L, NA, NA, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, NA, 5L, NA, NA, 5L, NA, NA, 5L, NA, 5L, 5L, NA, 
NA, NA, 5L, NA, 5L, 5L, NA, NA, NA, 5L, NA, 5L, 5L, NA), response3 = c(4L, 
5L, 1L, 1L, 4L, 1L, 1L, 4L, 5L, 1L, 1L, 5L, NA, 1L, 4L, NA, NA, 
NA, 3L, 2L, NA, 4L, NA, NA, NA, 3L, NA, NA, 4L, NA, 1L, NA, 3L, 
NA, 2L, 4L, NA, NA, NA, NA, NA, NA, NA, 2L, 1L, 1L, NA, NA, 1L, 
NA, 3L, 1L, NA, NA, NA, 1L, NA, 3L, 1L, NA, NA, NA, 1L)), .Names = c("var1", 
"var2", "response1", "response2", "response3"), class = "data.frame", row.names = c(NA, 
-63L))

      

I used the reshape2

plot command to melt my data for cut / simulation:

library(reshape2)
test_melted <- melt(test, id.var = c("var1", "var2"), na.rm = T)

      

And here was the plot I created:

library(ggplot2)
p <- ggplot(test_melted, aes(x = var1, y = value)) + geom_point()
p <- p + facet_grid(~variable) + coord_flip()
p <- p + geom_jitter(position = position_jitter(width=0.2, height = 0.2))
p

      

This gives the following:

enter image description here

Looked okay enough, but then I noticed that there seemed to be more points per torch / factor level than it should be. I narrowed it down to one level forvar1

test_subset <- test_melted[test_melted$var1 == "E", ]

nrow(test_subset)
[1] 18

summary(test_subset)
      var1    var2        variable     value  
 E      :18   V1:3   response1:7   Min.   :1  
 A      : 0   V2:2   response2:7   1st Qu.:3  
 B      : 0   V3:3   response3:4   Median :5  
 C      : 0   V4:2                 Mean   :4  
 D      : 0   V5:3                 3rd Qu.:5  
 F      : 0   V6:2                 Max.   :5  
 (Other): 0   V7:3 

      

So, we should have built 18 common points (7 for response1

, 7 for response2

and 4 for response3

. Let's try:

p <- ggplot(test_subset, aes(x = var1, y = value)) + geom_point()
p <- p + facet_grid(~variable) + coord_flip()
p <- p + geom_jitter(position = position_jitter(width=0.2, height = 0.2))
p

      

enter image description here

I am counting 11 points in a veil response1

, 8 in response2

and 8 in response3

.

It must be something stupid that I am missing. I've done a lot of dot cuts and it just never happened (or never noticed!).

Things i have tried

  • Deleting coord_flip()

  • test_subset <- droplevels(test_subset)

    if empty factor levels were messing with something
  • Playing with facet_grid(~variable)

    vs. facet_grid(.~variable)

    vs. facet_grid(variable~)

    vs.facet_grid(variable~.)

As a final note, I get different numbers of points depending on whether it's a face or not. Cut I get 11 + 8 + 8 = 27

, if I remove facet_grid(~variable)

I get 23.

Thanks for any suggestions!

+3


source to share


1 answer


The problem is not related to the cut, it has to do with the fact that two geomes are used in your plot. Thus, it geom_point

will draw your dots in one place and then geom_jitter

draw them again, in random positions. This is why you can see another point in every plot.

If you remove the call geom_point

, everything is back to normal:



p <- ggplot(test_subset, aes(x = var1, y = value))
p <- p + facet_grid(~variable) + coord_flip()
p <- p + geom_jitter(position = position_jitter(width=0.2, height = 0.2))
p

      

enter image description here

+2


source







All Articles