Expand the list of columns data.tables
I have data.table
with a list column where each item is data.table
:
dt <- data.table(id = c(1, 1, 2),
var = list(data.table(a = c(1, 2), b = c(3, 4)),
data.table(a = c(5, 6), b = c(7, 8)),
data.table(a = 9, b = 10)))
dt
# id var
# 1: 1 <data.table>
# 2: 1 <data.table>
# 3: 2 <data.table>
Now I want to "lock" this structure so that:
a b id
1: 1 3 1
2: 2 4 1
3: 5 7 1
4: 6 8 1
5: 9 10 2
I know how to expand the inline part data.table
using rbindlist
, but just don't know how to bind the flattened one data.table
to the "id" variable.
The original dataset is 30 million rows and dozens of variables, so I would really appreciate it if you could come up with a solution that is not only workable, but also memory efficient.
source to share
In this case it works dt[, var[[1]], by=id]
. However, I use rbindlist
as the OP pointed out:
dt[, r := as.character(.I) ]
res <- dt[, rbindlist(setNames(var, r), id="r")]
Then concatenate on r
(lines dt
) if you really need any vars:
res[dt, on=.(r), `:=`(id = i.id)]
This is better than dt[, var[[1]], by=id]
several ways:
-
rbindlist
should be faster than anything with a lot of groupsby=
. - If there are
dt
more vars, they should all be inby=
. - There is probably no need to transfer vars from
dt
at all, since they can always be grabbed from this table later, and they take up much less memory there.
source to share