Concatenate all columns by reference in the data table.

Question

Concatenate all columns by reference in the data table.

Good day to all,

I would like to concatenate the two data.table

together by reference without writing down all the variables I want to concatenate. Here's a simple example to understand my needs:

> set.seed(20170711)
> (a <- data.table(v_key=seq(1, 5), key="v_key"))
   v_key
1:     1
2:     2
3:     3
4:     4
5:     5

> a_backup <- copy(a)

> (b <- data.table(v_key=seq(1, 5), v1=runif(5), v2=runif(5), v3=runif(5),     key="v_key"))
   v_key          v1        v2          v3
1:     1 0.141804303 0.1311052 0.354798849
2:     2 0.425955903 0.3635612 0.950234261
3:     3 0.001070379 0.4615936 0.359660693
4:     4 0.453054854 0.5768500 0.008470552
5:     5 0.951767837 0.1649903 0.565894298

I want to copy all columns b

in a

by reference without specifying the column names.

I could do the following, but that would make a copy of the object for no reason, decreasing the performance of my program and increasing the required RAM:

> (a  <- a[b])
   v_key          v1        v2          v3
1:     1 0.141804303 0.1311052 0.354798849
2:     2 0.425955903 0.3635612 0.950234261
3:     3 0.001070379 0.4615936 0.359660693
4:     4 0.453054854 0.5768500 0.008470552
5:     5 0.951767837 0.1649903 0.565894298

Another option (without a useless copy) is to specify the name of each column b

, which will result in the following:

> a <- copy(a_backup)
> a[b, `:=`(
+   v1=v1,
+   v2=v2,
+   v3=v3
+ )][]
   v_key          v1        v2          v3
1:     1 0.141804303 0.1311052 0.354798849
2:     2 0.425955903 0.3635612 0.950234261
3:     3 0.001070379 0.4615936 0.359660693
4:     4 0.453054854 0.5768500 0.008470552
5:     5 0.951767837 0.1649903 0.565894298

In a nutshell, I would like to have the efficiency of my second example (no useless copy) without specifying all the column names in b

.

I think I could find a way to do this using a combination of functions colnames()

and get()

, but I'm wondering if there is a cleaner way to do this, the syntax is so important to me.

Thanks everyone!

Lrd

+3

r data.table

Jp Le Cavalier 11 jul. 17 at 20:04

source to share

2 answers

Looking back at the question here , I always like Reduce

this kind of situation:

# provide list of DTs to be merged
arbitrary.dts <- list(...)

a <- Reduce(function(x, y) merge(x, y, all=T, 
    by=c("v_key")), arbitrary.dts, accumulate=F)

Just one idea (I always like to start with basic functionality). I'm sure there is a lot of slicker during the trip data.table

.

0

cmaher 11 jul. 17 at 20:20

source to share

Jealie · Accepted Answer · 2017-07-11T23:45:31+0000

As you mentioned, the combination colnames

and mget

could be there.

Consider this:

# retrieve the column names from b - without the key ('v_key')
thecols = setdiff(colnames(b), key(b))

# assign them to a
a[b, (thecols) := mget(thecols)]

It's not that bad, is it?

Also, I don't think any other syntax is currently implemented with data.table

. But I would be glad to be wrong :)

Concatenate all columns by reference in the data table.

More articles: