Unexpected result from data.table when viewed in another table

Question

Unexpected result from data.table when viewed in another table

I am trying to check if a value from a data table is present in another data table. However, I am not getting the correct output:

> dt1 <- data.table(x=c(8,5,3), y=rnorm(3))
> dt2 <- data.table(a=c(1,2,3,4,5), b=rnorm(5))
> setkey(dt1,x)
> setkey(dt2,a)
> 
> dt1
   x          y
1: 3 0.84929113
2: 5 1.33433818
3: 8 0.04170333
> dt2
   a           b
1: 1  2.00634915
2: 2 -1.53137195
3: 3 -1.49436741
4: 4 -1.66878993
5: 5 -0.06394713
> 
> dt1[,is_present_in_dt2:=nrow(dt2[x, nomatch=0L])]
> dt1
   x          y is_present_in_dt2
1: 3 0.84929113                 3
2: 5 1.33433818                 3
3: 8 0.04170333                 3

Expected result:


   x          y is_present_in_dt2
1: 3 0.84929113                 1
2: 5 1.33433818                 1
3: 8 0.04170333                 0

+3

r data.table

poiuytrez 12 Sep 14 at 12:23

source to share

1 answer

Mike.Gahan · Accepted Answer · 2014-09-12T13:08:31+0000

I think this is actually more straight forward than you might think. Think of it as substituting d1 with d2 in the i statement.

dt1 <- data.table(x=c(8,5,3), y=rnorm(3))
dt2 <- data.table(a=c(1,2,3,4,5), b=rnorm(5))
setkey(dt1,x)
setkey(dt2,a)

dt1[dt2, presnt := 1] #Where they merge, make it a 1
dt1[!dt2, presnt := 0] #Where they don't merge, make it a 0

And the result:

   x          y presnt
1: 3 -0.6938894      1
2: 5  0.4891611      1
3: 8 -1.8227498      0

And another way to think about it:

overlap <- intersect(dt1$x,dt2$a)
dt1[x %in% overlap, present := 1]
dt1[!(x %in% overlap), present := 0]

The first way is much faster. The second way can help you understand the first way.

Unexpected result from data.table when viewed in another table

More articles: