Define observations not included by filtering with dplyr - R
When used dplyr
on large data frames, I often use multiple filtering arguments. Often times I could include them in one argument filter
. However, I love how dplyr allows you to gradually think about what you are doing with the data, so these filters can often be on serial lines.
However, often I want to not only store the observations produced by these sequential filters in a new df, but also the observations from the original df that were not included in a separate df.
For example, this dataset:
set.seed(123)
colors<- c( rep("yellow", 5), rep("blue", 5), rep("green", 5) )
shapes<- c("circle", "star", "oblong")
numbers<-sample(1:15,replace=T)
group<-sample(LETTERS, 15, replace=T)
mydf<-data.frame(colors,shapes,numbers,group)
mydf
colors shapes numbers group
1 yellow circle 5 X
2 yellow star 12 G
3 yellow oblong 7 B
4 yellow circle 14 I
5 yellow star 15 Y
6 blue oblong 1 X
7 blue circle 8 S
8 blue star 14 Q
9 blue oblong 9 Z
10 blue circle 7 R
11 green star 15 S
12 green oblong 7 O
13 green circle 11 P
14 green star 9 H
15 green oblong 2 D
Here, let's say I would like to filter by the following rules (I know it might make sense to filter in a different order, for example by color first, but for the sake of the argument):
mydf %>%
filter (numbers <= 5 | numbers >= 12) %>%
filter (group=="X" | group =="Y" | group == "Z") %>%
filter (colors=="yellow")
which returns:
colors shapes numbers group
1 yellow circle 5 X
2 yellow star 15 Y
My question is, how can I store the 13 observations from the original "mydf" not returned by the filter into a separate df? Is there a cute way to dplyr?
source to share