Create aggregate output data.table from a function that returns multiple outputs

I am struggling with a solution to a specific problem I have and I have searched stackoverflow and found examples that are close, but not quite what I want. An example that comes closest here

This post (here) is also close, but I can't get my plural output function to work with list ()

What I want to do is create a table with aggregated values ​​(min, max, mean, MyFunc) grouped by key. I also have some complex functions that return multiple outputs. I could return separate outputs, but that would mean executing a complex function many times and would take too long.

Using Matt Dowle's example from this post with some modifications ...

x <- data.table(a=1:3,b=1:6)[]
   a b
1: 1 1
2: 2 2
3: 3 3
4: 1 4
5: 2 5
6: 3 6

      

This is the type of output I want. Aggregate table (here with mean and sum only)

agg.dt <- x[ , list(mean=mean(b), sum=sum(b)), by=a][]
   a mean sum
1: 1  2.5   5
2: 2  3.5   7
3: 3  4.5   9

      

In this example, the function f returns 3 outputs. My real function is much more complex and the components cannot be separated like that.

f <- function(x) {list(length(x), min(x), max(x))}

      

Matt Dowle's suggestion in the previous post works great, but it doesn't create or roll the table, instead the aggregates are added to the main table (which is also very useful in other circumstances).

x[, c("length","min", "max"):= f(b), by=a][]
   a b length min max
1: 1 1      2   1   4
2: 2 2      2   2   5
3: 3 3      2   3   6
4: 1 4      2   1   4
5: 2 5      2   2   5
6: 3 6      2   3   6

      

What I really want to do (if possible) is something like this ...

agg.dt <- x[ , list(mean=mean(b)
                       , sum=sum(b)
                       , c("length","min", "max") = f(b)
), by=a]

      

and return an aggregate table that looks something like this:

     a mean sum length min max
1: 1  2.5   5           2   1   4
2: 2  3.5   7           2   2   5
3: 3  4.5   9           2   3   6

      

I can only see the solution, where is the two step process and joining / joining tables together?

+3


source to share


1 answer


library(data.table)
x <- data.table(a=1:3,b=1:6)
#have the function return a named list
f <- function(x) {list(length=length(x), 
                       min=min(x), 
                       max=max(x))}

# c can combine lists
# c(vector, vector, 3-list) is a 5-list
agg.dt <- x[ , c(mean=mean(b),
                 sum=sum(b),
                 f(b)), 
            by=a]

#   a mean sum length min max
#1: 1  2.5   5      2   1   4
#2: 2  3.5   7      2   2   5
#3: 3  4.5   9      2   3   6

      

Alternatively, strip names from f()

to save the time and cost of creating the same names for each group:

f <- function(x) {list(length(x), 
                       min(x), 
                       max(x))}

agg.dt <- x[ , c(mean(b),
                 sum(b),
                 f(b)),
            by=a]

setnames(agg.dt, c("a", "mean","sum","length", "min", "max"))

      



This drop-names-and-put-them-back-after-after trick (for speed when you have a lot of groups) doesn't reach inside f()

. f()

can return anything that makes optimization difficult data.table

.

Just to mention that it base::list()

no longer copies named inputs like it did in R 3.1. So a generic R-image of a function f()

doing some tricky steps and then returning local variables list()

at the end should be faster.

+5


source







All Articles