How do I create a python-style dictionary over a data.table in R?
I'm looking in a Python-like dictionary structure in R to replace values ββin a large dataset (> 100MB) and I think the data.table package can help me. However, I cannot find an easy way to solve the problem.
For example, I have two data.table:
Table A:
V1 V2
1: A B
2: C D
3: C D
4: B C
5: D A
Table B:
V3 V4
1: A 1
2: B 2
3: C 3
4: D 4
I want to use B as a dictionary to replace values ββin A. Thus, I want to get:
Table R:
V5 V6
1 2
3 4
3 4
2 3
4 1
What I've done:
c2=tB[tA[,list(V2)],list(V4)]
c1=tB[tA[,list(V1)],list(V4)]
Although I specified j = list (V4), it still returned V3 values ββto me. I do not know why.
c2:
V3 V4
1: B 2
2: D 4
3: D 4
4: C 3
5: A 1
c1:
V3 V4
1: A 1
2: C 3
3: C 3
4: B 2
5: D 4
Finally, I concatenated the two columns V4
and got the result I want.
But I think there must be a much easier way to do this. Any ideas?
source to share
Here's an alternative way:
setkey(B, V3)
for (i in seq_len(length(A))) {
thisA = A[[i]]
set(A, j=i, value=B[thisA]$V4)
}
# V1 V2
# 1: 1 2
# 2: 3 4
# 3: 3 4
# 4: 2 3
# 5: 4 1
Since thisA
is a character column, we don't need it J()
(for convenience). Here the columns A
are replaced with a reference and are therefore also memory efficient. But if you don't want to replace A
, you can just use cA <- copy(A)
and replace columns cA
.
Alternatively, using :=
:
A[, names(A) := lapply(.SD, function(x) B[J(x)]$V4)]
# or
ans = copy(A)[, names(A) := lapply(.SD, function(x) B[J(x)]$V4)]
(after user comment 2923419): you can leave J()
if the search is a single column type (just for convenience).
In 1.9.3 , when j
is a single column, it returns a vector (based on a custom query). So this is the slightly more natural syntax of data.table:
setkey(B, V3)
for (i in seq_len(length(A))) {
thisA = A[[i]]
set(A, j=i, value=B[thisA, V4])
}
source to share