How do I create a python-style dictionary over a data.table in R?

I'm looking in a Python-like dictionary structure in R to replace values ​​in a large dataset (> 100MB) and I think the data.table package can help me. However, I cannot find an easy way to solve the problem.

For example, I have two data.table:

Table A:

   V1 V2
1:  A  B
2:  C  D
3:  C  D
4:  B  C
5:  D  A

      

Table B:

   V3 V4
1:  A  1
2:  B  2
3:  C  3
4:  D  4

      

I want to use B as a dictionary to replace values ​​in A. Thus, I want to get:

Table R:

V5 V6
 1  2
 3  4
 3  4
 2  3
 4  1

      

What I've done:

c2=tB[tA[,list(V2)],list(V4)]
c1=tB[tA[,list(V1)],list(V4)]

      

Although I specified j = list (V4), it still returned V3 values ​​to me. I do not know why.

c2:

   V3 V4
1:  B  2
2:  D  4
3:  D  4
4:  C  3
5:  A  1

      

c1:

   V3 V4
1:  A  1
2:  C  3
3:  C  3
4:  B  2
5:  D  4

      

Finally, I concatenated the two columns V4

and got the result I want.

But I think there must be a much easier way to do this. Any ideas?

+3


source to share


2 answers


Here's an alternative way:

setkey(B, V3)
for (i in seq_len(length(A))) {
    thisA = A[[i]]
    set(A, j=i, value=B[thisA]$V4)
}
#    V1 V2
# 1:  1  2
# 2:  3  4
# 3:  3  4
# 4:  2  3
# 5:  4  1

      

Since thisA

is a character column, we don't need it J()

(for convenience). Here the columns A

are replaced with a reference and are therefore also memory efficient. But if you don't want to replace A

, you can just use cA <- copy(A)

and replace columns cA

.


Alternatively, using :=

:



A[, names(A) := lapply(.SD, function(x) B[J(x)]$V4)]
# or
ans = copy(A)[, names(A) := lapply(.SD, function(x) B[J(x)]$V4)]

      

(after user comment 2923419): you can leave J()

if the search is a single column type (just for convenience).


In 1.9.3 , when j

is a single column, it returns a vector (based on a custom query). So this is the slightly more natural syntax of data.table:

setkey(B, V3)
for (i in seq_len(length(A))) {
    thisA = A[[i]]
    set(A, j=i, value=B[thisA, V4])
}

      

+2


source


I'm not sure how fast this is with big data, but chmatch

should be fast.



tA[ , lapply(.SD,function(x) tB$V4[chmatch(x,tB$V3)])]

   V1 V2
1:  1  2
2:  3  4
3:  3  4
4:  2  3
5:  4  1

      

0


source







All Articles