Expand the R-footer of the data.frame, keeping the rest of the values ββin the row
I need to effectively "expand" a column of a list box in an R data.frame. For example, if I have a data.frame defined as:
dbt <- data.frame(values=c(1,1,1,1,2,3,4),
parm1=c("A","B","C","A","B","C","B"),
parm2=c("d","d","a","b","c","a","a"))
Then, let's take an analysis that generates one column as a list, similar to the following output:
agg <- aggregate(values ~ parm1 + parm2, data=dbt,
FUN=function(x) {return(list(x))})
The compiled data.frame looks like this: (where class (agg $ values) == "list"):
parm1 parm2 values
1 B a 4
2 C a 1, 3
3 A b 1
4 B c 2
5 A d 1
6 B d 1
I would like to expand the "values" column by iterating over the values ββof parm1 and 2 (adding more rows) in an efficient way for each list item across all rows of the data.frame.
At the top level, I wrote a function that performs a reversal in a for loop called in the application. This is really inefficient (cumulative data.frame takes about an hour to create and almost 24 hours to deploy, fully expanded data has ~ 500k records). The top level I'm using is:
unrolled.data <- do.call(rbind, apply(agg, 1, FUN=unroll.data))
The function simply calls unlist () on the value column object and then creates the data.frame object in the for loop as the return object.
The environment is somewhat limited and the tidyr, data.table and splitstackshape libraries are not available to me, this requires not only the functions found in the :: database, but only those available in v3.1.1 and earlier. So the answers to this (not exactly duplicated) question don't apply.
Any suggestions for something faster?
Thank!
source to share
With an R base, you can try
with(agg, {
data.frame(
lapply(agg[,1:2], rep, times=lengths(values)),
values=unlist(values)
)
})
# parm1 parm2 values
# 1.2 B a 4
# 1.31 C a 1
# 1.32 C a 3
# 2.1 A b 1
# 3.2 B c 2
# 4.1 A d 1
# 4.2 B d 1
Timeline for an alternative (thanks @thelatemail)
library(dplyr)
agg %>%
sample_n(1e7, replace=T) -> bigger
system.time(
with(bigger, { data.frame(lapply(bigger[,1:2], rep, times=lengths(values)), values=unlist(values)) })
)
# user system elapsed
# 3.78 0.14 3.93
system.time(
with(bigger, { data.frame(bigger[rep(rownames(bigger), lengths(values)), 1:2], values=unlist(values)) })
)
# user system elapsed
# 11.30 0.34 11.64
source to share