Data request. Table by key in R
I followed the data.table view. The key is set in the x column of the data.table and then queried. I tried to set the key on the v column and it doesn't work. Any ideas on what I am doing wrong?
> set.seed(34)
> DT = data.table(x=c("b","b","b","a","a"),v=rnorm(5))
> DT
x v
1: b -0.1388900
2: b 1.1998129
3: b -0.7477224
4: a -0.5752482
5: a -0.2635815
> setkey(DT,v)
> DT[1.1998129,]
x v
1: b -0.7477224
EXPECTED:
x v
1: b 1.1998129
source to share
When the first argument [.data.table
is a number, it will not do the join, but just look for the line number. Since after setkey
yours data.table
looks like this:
DT
# x v
#1: b -0.7477224
#2: a -0.5752482
#3: a -0.2635815
#4: b -0.1388900
#5: b 1.1998129
And since it as.integer(1.1998129)
is 1, you get the first line.
Now, if you intended to make a join, you must use the syntax DT[J(...)]
or DT[.(...)]
, and this will work as expected, provided that you use the correct number (as a convenience, you do not need to use J
when working with character columns, because there DT["a"]
is no default values):
DT[J(v[5])]
# x v
#1: b 1.199813
Please note that DT[J(1.1998129)]
won't work because:
DT$v[5] == 1.1998129
#[1] FALSE
You can print a lot of numbers and this will work:
options(digits = 22)
DT$v[5]
#[1] 1.199812896606383683107
DT$v[5] == 1.199812896606383683107
#[1] TRUE
DT[J(1.199812896606383683107)]
# x v
#1: b 1.199812896606383683107
but there is an additional subtlety here, it's worth noting that R and data.table
have different prefixes when the floating point numbers are equal:
DT$v[5] == 1.19981289660638
#[1] FALSE
DT[J(1.19981289660638)]
# x v
#1: b 1.199812896606379908349
In short, be careful when concatenating floating point numbers.
source to share