Strange behavior of subset with multiple keys using data.table

I have set multiple keys in the data.table, but when I try to select rows by multiple key values, they seem to return a row for each potential combination, but are populated with NA for rows that don't exist.

I can get a sample code in 1c this doc , so it must be something I just can't see. Any help would be much appreciated.


dt = data.table(colA = 1:4,
                colB = c("A","A","B","B"),
                colC = 11:14)


# colA colB colC
# 1:    1    A   11
# 2:    2    A   12
# 3:    3    B   13
# 4:    4    B   14

# As expected
# colA colB colC
# 1:    2    A   12

# colA colB colC
# 1:    2    A   12
# 2:    3    A   NA #Unexpected

# colA colB colC
# 1:    1    A   11
# 2:    2    A   12
# 3:    3    A   NA #Unexpected
# 4:    4    A   NA #Unexpected



source to share

1 answer


will search every line i

in strings DT

. By default, the line NA

shows inconsistent lines i

. Move the inconsistent lines instead, use nomatch = 0


dt[.(unique(colA),"A"), nomatch=0]

#    colA colB colC
# 1:    1    A   11
# 2:    2    A   12


The argument is nomatch

covered in the OP's vignette. To find the latest vignette, use browseVignettes("data.table")


As a side note, there is no need to set the keys before joining. You can use instead on=


dt2 = data.table(colA = 1:4,
                colB = c("A","A","B","B"),
                colC = 11:14)

dt2[.(unique(colA),"A"), on=.(colA, colB), nomatch=0]

#    colA colB colC
# 1:    1    A   11
# 2:    2    A   12


See Arun's answer for details on why bindings are usually not required to improve performance on connections. It says:

Generally, unless there are repeated grouping / joining operations performed on the same key data table, there should be no discernible difference.

I usually only set the keys when I do the merge interactively, so I can skip the input on=




All Articles