Unexpected result from data.table when viewed in another table
I am trying to check if a value from a data table is present in another data table. However, I am not getting the correct output:
> dt1 <- data.table(x=c(8,5,3), y=rnorm(3))
> dt2 <- data.table(a=c(1,2,3,4,5), b=rnorm(5))
> setkey(dt1,x)
> setkey(dt2,a)
>
> dt1
x y
1: 3 0.84929113
2: 5 1.33433818
3: 8 0.04170333
> dt2
a b
1: 1 2.00634915
2: 2 -1.53137195
3: 3 -1.49436741
4: 4 -1.66878993
5: 5 -0.06394713
>
> dt1[,is_present_in_dt2:=nrow(dt2[x, nomatch=0L])]
> dt1
x y is_present_in_dt2
1: 3 0.84929113 3
2: 5 1.33433818 3
3: 8 0.04170333 3
Expected result:
x y is_present_in_dt2
1: 3 0.84929113 1
2: 5 1.33433818 1
3: 8 0.04170333 0
+3
source to share
1 answer
I think this is actually more straight forward than you might think. Think of it as substituting d1 with d2 in the i statement.
dt1 <- data.table(x=c(8,5,3), y=rnorm(3))
dt2 <- data.table(a=c(1,2,3,4,5), b=rnorm(5))
setkey(dt1,x)
setkey(dt2,a)
dt1[dt2, presnt := 1] #Where they merge, make it a 1
dt1[!dt2, presnt := 0] #Where they don't merge, make it a 0
And the result:
x y presnt
1: 3 -0.6938894 1
2: 5 0.4891611 1
3: 8 -1.8227498 0
And another way to think about it:
overlap <- intersect(dt1$x,dt2$a)
dt1[x %in% overlap, present := 1]
dt1[!(x %in% overlap), present := 0]
The first way is much faster. The second way can help you understand the first way.
+6
source to share