Subset if contains multiple variables in a specific order

In my data frame, I have two columns of interest: id

and name

- my goal is to keep records id

where there id

is more than one value in and where the end value in is "B".name

name

Sample data will look like this:

> test
   id name
1   1    A
2   2    A
3   3    A
4   4    A
5   5    A
6   6    A
7   7    A
8   2    B
9   1    B
10  2    A

      

and the result will look like this:

> output
   id name
1   1    A
9   1    B

      

How would you filter to get these rows in R? I know that you can filter for those with multiple variables using an operator %in%

, but I'm not sure how to add to the condition that "B" should be the last entry. I don't mind using a package like dplyr

, but an R-base solution would be perfect. Any suggestions?

Here's some sample data:

> dput(test)
structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 2, 1, 2), name = c("A", 
"A", "A", "A", "A", "A", "A", "B", "B", "A")), .Names = c("id", 
"name"), row.names = c(NA, -10L), class = "data.frame")

      

+3


source to share


2 answers


Using dplyr

,



test %>% 
 group_by(id) %>% 
 filter(n_distinct(name) > 1 & last(name) == 'B')

#Source: local data frame [2 x 2]
#Groups: id [1]

# A tibble: 2 x 2
#     id  name
#  <dbl> <chr>
#1     1     A
#2     1     B

      

+4


source


In data.table

:



library(data.table)
setDT(test)[, .SD[length(unique(name)) >= 2 & name[.N] == "B"],by = .(id)]
#   id name
#1:  1    A
#2:  1    B

      

0


source







All Articles