Two data.tables number of matching columns

If I have two data.tables, dt1

and dt2

, I want the number of matches between columns using if then then logic. If dt1$V1==dt$V2

, then dt$V1 == dt$V2

? But the key for this if-then statement is to group by matches in dt1$V1 == dt$V2

. I would like to use data.table for its efficiency, since I actually have a large dataset.

dt1 <- data.table(c("a","b","c","d","e"), c(1:5))
dt2 <- data.table(c("a","d","e","f","g"), c(3:7))

      

In this dummy example, there are 3 matches between V1, but only two within those groups for V2. So the answer (using nrow

maybe if I'm a subset) would be 2.

+3


source to share


3 answers


I assume you are looking for fintersect

:

fintersect(dt1,dt2)

      

gives:

   V1 V2
1:  d  4
2:  e  5

      



To get the number of lines add [, .N]

:

fintersect(dt1,dt2)[, .N]

      

which gives:

[1] 2

      

+6


source


Well, it's not pretty, but it works:

sum(dt1[V1 %in% dt2$V1]$V2 ==   dt2[V1 %in% dt1[V1 %in% dt2$V1]$V1]$V2)

      

Just read my comment if you want a datasheet. with the right combinations, you could have done it even longer, for example:



dt1[V1 %in% dt2$V1][dt1[V1 %in% dt2$V1]$V2 ==   dt2[V1 %in% dt1[V1 %in% dt2$V1]$V1]$V2]

    V1 V2
1:  d  4
2:  e  5

      

I am looking forward to other answers :)

+1


source


We can just do join

dt1[dt2, on = names(dt1), nomatch = 0]
#   V1 V2
#1:  d  4
#2:  e  5

      


or inner_join

fromdplyr

library(dplyr)
inner_join(dt1, dt2)
#  V1 V2
#1  d  4
#2  e  5

      


Or using merge

merge(dt1, dt2)
#   V1 V2
#1:  d  4
#2:  e  5

      


For all of the above, the number of matches can be found at nrow

nrow(merge(dt1, dt2))

      

+1


source







All Articles