# Check if a pair of columns is in a row of a dataframe

I would like to know if there is an efficient way to check if a given pair (or tuple of more than two) columns is in the dataframe.

For example, suppose I have the following dataframe:

```
df=data.frame(c("a","b","c","d"),c("e","f","g","h"),c(1,0,0,1))
names(df)=c('col1','col2','col3')
col1 col2 col3
1 a e 1
2 b f 0
3 c g 0
4 d h 1
```

and I want to check if this table contains a list of column pairs like: (a, b), (a, c), (a, e), (c, a), (c, g), (a, f)

to which it should output:

```
FALSE FALSE TRUE FALSE TRUE FALSE
```

Edit: Added a new pair (a, f) to avoid confusion

I thought about this by concatenating columns into rows and then comparing them to% in%, but that is pretty inefficient. I also thought about doing a loop with a dplyr filter, but it also takes quite a long time when the table is huge and needs format conversions (i.e. writing multiple lines).

Is there an efficient way to accomplish this in R?

source to share

This is similar to the case for one of the function families `apply`

or `lapply`

. If you define `pairs.list`

how `list`

, you can use `lapply`

:

```
df = data.frame(c("a","b","c","d"), c("e","f","g","h"), c(1,0,0,1))
names(df) = c('col1','col2','col3')
pairs.list = list(c("a", "b"), c("a", "c"), c("a", "e"), c("c", "a"), c("c", "g"))
lapply(pairs.list, FUN=function(x){any(df$col1==x[[1]] & df$col2==x[[2]])})
[[1]]
[1] FALSE
[[2]]
[1] FALSE
[[3]]
[1] TRUE
[[4]]
[1] FALSE
[[5]]
[1] TRUE
new.pairs = list(c("a", "b"), c("a", "c"), c("e", "a"), c("c", "a"), c("c", "g"))
lapply(new.pairs, FUN=function(x){any(df$col1==x[[1]] & df$col2==x[[2]])})
[[1]]
[1] FALSE
[[2]]
[1] FALSE
[[3]]
[1] FALSE
[[4]]
[1] FALSE
[[5]]
[1] TRUE
```

With this method, if you want to find out the string `df`

that matches, you can get rid of the call `any()`

and get a list of vectors of gates, where each vector is the same length as `df`

.

I think it should be relatively efficient because it's logical logic, not string manipulation, but I'm not an expert on benchmarking performance in R, so I don't know for sure.

source to share

If you only need to check that the combinations of columns are in the table or not, you can use `unique`

to reduce the number of comparisons:

```
df=data.frame(c("a","b","c","d"),c("e","f","g","h"),c(1,0,0,1), stringsAsFactors=FALSE)
names(df)=c('col1','col2','col3')
df$to_check = paste(df$col1, df$col2, sep=',')
cols <- c("a,b", "a,c", "a,e", "c,a", "c,g")
cols %in% unique(df$to_check)
```

source to share