Determine which list objects are contained (subset) in another list in R
Thanks for your kind answer to my previous questions. I have two lists: list1 and list2. I would like to know if every list1 object is contained in every list2 object. For example:
> list1
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
> list2
[[1]]
[1] 1 2 3
[[2]]
[1] 2 3
[[3]]
[1] 2 3
Here are my questions: 1.) How do you ask R to check if an object is a subset of another object in the list? For example, I would like to check if list2[[3]]={2,3}
(subset) contains list1[[2]]={2}
. When I do list2[[3]] %in% list1[[2]]
, I receive [1] TRUE FALSE
. However, this is not what I want to do ?! I just want to check if it is a list2[[3]]
subset list1[[2]]
i.e. Is {2,3} \ subset of {3} like in theoretical set? I don't want to do an elemental check, as R seems to work with the% in% command. Any suggestions?
2.) Is there a way to efficiently perform all pairwise comparisons of subsets (i.e., list1[[i]]
subset list2[[j]]
, for all combinations i,j
? Will something like outer(list1,list2, func.subset)
work after answering question number 1? Thank you for your feedback!
source to share
setdiff
compares unique values
length(setdiff(5, 1:5)) == 0
Alternatively all(x %in% y)
will work well.
To do all the comparisons, something like this will work:
dt <- expand.grid(list1,list2)
dt$subset <- apply(dt,1, function(.v) all(.v[[1]] %in% .v[[2]]) )
Var1 Var2 subset
1 1 1, 2, 3 TRUE
2 2 1, 2, 3 TRUE
3 3 1, 2, 3 TRUE
4 1 2, 3 FALSE
5 2 2, 3 TRUE
6 3 2, 3 TRUE
7 1 2, 3 FALSE
8 2 2, 3 TRUE
9 3 2, 3 TRUE
Note that this is expand.grid
not the fastest way to do this when dealing with a lot of data (dwin's solution is better in this regard), but it allows you to quickly check visually if it does what you want.
source to share
You can use the package sets
like this:
library(sets)
is.subset <- function(x, y) as.set(x) <= as.set(y)
outer(list1, list2, Vectorize(is.subset))
# [,1] [,2] [,3]
# [1,] TRUE FALSE FALSE
# [2,] TRUE TRUE TRUE
# [3,] TRUE TRUE TRUE
@Michael or @WWin's basic version would work just as well, but for the second part of your question, I would say this outer
is the way to go.
source to share
is.subset <- function(x,y) {length(setdiff(x,y)) == 0}
First, a combo of list1 elements that are subsets of list2 elements:
> sapply(1:length(list1), function(i1) sapply(1:length(list2),
function(i2) is.subset(list1[[i1]], list2[[i2]]) ) )
[,1] [,2] [,3]
[1,] TRUE TRUE TRUE
[2,] FALSE TRUE TRUE
[3,] FALSE TRUE TRUE
Then it is not surprising that any of the elements of list2 (all length> 1) that are subsets of a list of one element (all length 1) are missing:
> sapply(1:length(list1), function(i1) sapply(1:length(list2),
function(i2) is.subset(list2[[i2]], list1[[i1]]) ) )
[,1] [,2] [,3]
[1,] FALSE FALSE FALSE
[2,] FALSE FALSE FALSE
[3,] FALSE FALSE FALSE
source to share
Adding to @ Michael's, here's a neat way to avoid the expand.grid clutter with the AsIs function:
list2 <- list(1:3,2:3,2:3)
a <- data.frame(list1 = 1:3, I(list2))
a$subset <- apply(a, 1, function(.v) all(.v[[1]] %in% .v[[2]]) )
list1 list2 subset
1 1 1, 2, 3 TRUE
2 2 2, 3 TRUE
3 3 2, 3 TRUE
source to share