Define observations not included by filtering with dplyr - R

Question

Define observations not included by filtering with dplyr - R

When used dplyr

on large data frames, I often use multiple filtering arguments. Often times I could include them in one argument filter

. However, I love how dplyr allows you to gradually think about what you are doing with the data, so these filters can often be on serial lines.

However, often I want to not only store the observations produced by these sequential filters in a new df, but also the observations from the original df that were not included in a separate df.

For example, this dataset:

set.seed(123)
colors<- c( rep("yellow", 5), rep("blue", 5), rep("green", 5) )
shapes<- c("circle", "star", "oblong")
numbers<-sample(1:15,replace=T)
group<-sample(LETTERS, 15, replace=T)
mydf<-data.frame(colors,shapes,numbers,group)
mydf


   colors shapes numbers group
1  yellow circle       5     X
2  yellow   star      12     G
3  yellow oblong       7     B
4  yellow circle      14     I
5  yellow   star      15     Y
6    blue oblong       1     X
7    blue circle       8     S
8    blue   star      14     Q
9    blue oblong       9     Z
10   blue circle       7     R
11  green   star      15     S
12  green oblong       7     O
13  green circle      11     P
14  green   star       9     H
15  green oblong       2     D

Here, let's say I would like to filter by the following rules (I know it might make sense to filter in a different order, for example by color first, but for the sake of the argument):

mydf %>% 
  filter (numbers <= 5 | numbers >= 12) %>% 
  filter (group=="X" | group =="Y" | group == "Z") %>% 
  filter (colors=="yellow")

which returns:

  colors shapes numbers group
1 yellow circle       5     X
2 yellow   star      15     Y

My question is, how can I store the 13 observations from the original "mydf" not returned by the filter into a separate df? Is there a cute way to dplyr?

+3

filter r dataframe dplyr

jalapic 24 Aug 14 at 14:51

source to share

1 answer

lukeA · Accepted Answer · 2014-08-25T23:09:32+0000

I suggest

sepDf <- setdiff(mydf, mydf.filtered)

Define observations not included by filtering with dplyr - R

More articles: