Expand the list of columns data.tables
I have data.table
with a list column where each item is data.table
:
dt <- data.table(id = c(1, 1, 2),
var = list(data.table(a = c(1, 2), b = c(3, 4)),
data.table(a = c(5, 6), b = c(7, 8)),
data.table(a = 9, b = 10)))
dt
# id var
# 1: 1 <data.table>
# 2: 1 <data.table>
# 3: 2 <data.table>
Now I want to "lock" this structure so that:
a b id
1: 1 3 1
2: 2 4 1
3: 5 7 1
4: 6 8 1
5: 9 10 2
I know how to expand the inline part data.table
using rbindlist
, but just don't know how to bind the flattened one data.table
to the "id" variable.
The original dataset is 30 million rows and dozens of variables, so I would really appreciate it if you could come up with a solution that is not only workable, but also memory efficient.
In this case it works dt[, var[[1]], by=id]
. However, I use rbindlist
as the OP pointed out:
dt[, r := as.character(.I) ]
res <- dt[, rbindlist(setNames(var, r), id="r")]
Then concatenate on r
(lines dt
) if you really need any vars:
res[dt, on=.(r), `:=`(id = i.id)]
This is better than dt[, var[[1]], by=id]
several ways:
-
rbindlist
should be faster than anything with a lot of groupsby=
. - If there are
dt
more vars, they should all be inby=
. - There is probably no need to transfer vars from
dt
at all, since they can always be grabbed from this table later, and they take up much less memory there.