Expand the R-footer of the data.frame, keeping the rest of the values in the row

Question

Expand the R-footer of the data.frame, keeping the rest of the values in the row

I need to effectively "expand" a column of a list box in an R data.frame. For example, if I have a data.frame defined as:

dbt <- data.frame(values=c(1,1,1,1,2,3,4), 
                  parm1=c("A","B","C","A","B","C","B"),
                  parm2=c("d","d","a","b","c","a","a"))

Then, let's take an analysis that generates one column as a list, similar to the following output:

agg <- aggregate(values ~ parm1 + parm2, data=dbt, 
                 FUN=function(x) {return(list(x))})

The compiled data.frame looks like this: (where class (agg $ values) == "list"):

  parm1 parm2 values
1     B     a      4
2     C     a   1, 3
3     A     b      1
4     B     c      2
5     A     d      1
6     B     d      1

I would like to expand the "values" column by iterating over the values of parm1 and 2 (adding more rows) in an efficient way for each list item across all rows of the data.frame.

At the top level, I wrote a function that performs a reversal in a for loop called in the application. This is really inefficient (cumulative data.frame takes about an hour to create and almost 24 hours to deploy, fully expanded data has ~ 500k records). The top level I'm using is:

unrolled.data <- do.call(rbind, apply(agg, 1, FUN=unroll.data))

The function simply calls unlist () on the value column object and then creates the data.frame object in the for loop as the return object.

The environment is somewhat limited and the tidyr, data.table and splitstackshape libraries are not available to me, this requires not only the functions found in the :: database, but only those available in v3.1.1 and earlier. So the answers to this (not exactly duplicated) question don't apply.

Any suggestions for something faster?

Thank!

+3

list r dataframe

TimH 22 jul. 15 at 12:23

source to share

1 answer

jenesaisquoi · Accepted Answer · 2015-07-22T00:39:56+0000

With an R base, you can try

with(agg, {
    data.frame(
        lapply(agg[,1:2], rep, times=lengths(values)),
        values=unlist(values)
    )
})
#      parm1 parm2 values
# 1.2      B     a      4
# 1.31     C     a      1
# 1.32     C     a      3
# 2.1      A     b      1
# 3.2      B     c      2
# 4.1      A     d      1
# 4.2      B     d      1

Timeline for an alternative (thanks @thelatemail)

library(dplyr)
agg %>%
  sample_n(1e7, replace=T) -> bigger

system.time(
    with(bigger, { data.frame(lapply(bigger[,1:2], rep, times=lengths(values)), values=unlist(values)) })
)
# user  system elapsed 
# 3.78    0.14    3.93 

system.time(
    with(bigger, { data.frame(bigger[rep(rownames(bigger), lengths(values)), 1:2], values=unlist(values)) })
)
# user  system elapsed 
# 11.30    0.34   11.64

Expand the R-footer of the data.frame, keeping the rest of the values ​​in the row

More articles:

Expand the R-footer of the data.frame, keeping the rest of the values in the row