Expand the list of columns data.tables

I have data.table

with a list column where each item is data.table

:

dt <- data.table(id = c(1, 1, 2),
                 var = list(data.table(a = c(1, 2), b = c(3, 4)),
                            data.table(a = c(5, 6), b = c(7, 8)),
                            data.table(a = 9, b = 10)))

dt
# id             var
# 1:  1 <data.table>
# 2:  1 <data.table>
# 3:  2 <data.table>

      

Now I want to "lock" this structure so that:

   a  b id
1: 1  3  1
2: 2  4  1
3: 5  7  1
4: 6  8  1
5: 9 10  2

      

I know how to expand the inline part data.table

using rbindlist

, but just don't know how to bind the flattened one data.table

to the "id" variable.

The original dataset is 30 million rows and dozens of variables, so I would really appreciate it if you could come up with a solution that is not only workable, but also memory efficient.

+3


source to share


1 answer


In this case it works dt[, var[[1]], by=id]

. However, I use rbindlist

as the OP pointed out:

dt[, r := as.character(.I) ]
res <- dt[, rbindlist(setNames(var, r), id="r")]

      

Then concatenate on r

(lines dt

) if you really need any vars:



res[dt, on=.(r), `:=`(id = i.id)]

      

This is better than dt[, var[[1]], by=id]

several ways:

  • rbindlist

    should be faster than anything with a lot of groups by=

    .
  • If there are dt

    more vars, they should all be in by=

    .
  • There is probably no need to transfer vars from dt

    at all, since they can always be grabbed from this table later, and they take up much less memory there.
+4


source







All Articles