Two data.tables number of matching columns
If I have two data.tables, dt1
and dt2
, I want the number of matches between columns using if then then logic. If dt1$V1==dt$V2
, then dt$V1 == dt$V2
? But the key for this if-then statement is to group by matches in dt1$V1 == dt$V2
. I would like to use data.table for its efficiency, since I actually have a large dataset.
dt1 <- data.table(c("a","b","c","d","e"), c(1:5))
dt2 <- data.table(c("a","d","e","f","g"), c(3:7))
In this dummy example, there are 3 matches between V1, but only two within those groups for V2. So the answer (using nrow
maybe if I'm a subset) would be 2.
source to share
Well, it's not pretty, but it works:
sum(dt1[V1 %in% dt2$V1]$V2 == dt2[V1 %in% dt1[V1 %in% dt2$V1]$V1]$V2)
Just read my comment if you want a datasheet. with the right combinations, you could have done it even longer, for example:
dt1[V1 %in% dt2$V1][dt1[V1 %in% dt2$V1]$V2 == dt2[V1 %in% dt1[V1 %in% dt2$V1]$V1]$V2]
V1 V2
1: d 4
2: e 5
I am looking forward to other answers :)
source to share
We can just do join
dt1[dt2, on = names(dt1), nomatch = 0]
# V1 V2
#1: d 4
#2: e 5
or inner_join
fromdplyr
library(dplyr)
inner_join(dt1, dt2)
# V1 V2
#1 d 4
#2 e 5
Or using merge
merge(dt1, dt2)
# V1 V2
#1: d 4
#2: e 5
For all of the above, the number of matches can be found at nrow
nrow(merge(dt1, dt2))
source to share