Return multiple columns from data aggregation.
I would like to use data.table
alternatively aggregate()
or ddply()
as these two methods do not scale to large objects as efficiently as hoped. Unfortunately I didn't figure out how to get the vector-returning aggregates to generate multiple columns in the result from data.table
. For example:
# required packages
library(plyr)
library(data.table)
# simulated data
x <- data.table(value=rnorm(100), g=rep(letters[1:5], each=20))
# ddply output that I would like to get from data.table
ddply(data.frame(x), 'g', function(i) quantile(i$value))
g 0% 25% 50% 75% 100%
1 a -1.547495 -0.7842795 0.202456288 0.6098762 2.223530
2 b -1.366937 -0.4418388 -0.085876995 0.7826863 2.236469
3 c -2.064510 -0.6411390 -0.257526983 0.3213343 1.039053
4 d -1.773933 -0.5493362 -0.007549273 0.4835467 2.116601
5 e -0.780976 -0.2315245 0.194869630 0.6698881 2.207800
# not quite what I am looking for:
x[, quantile(value), by=g]
g V1
1: a -1.547495345
2: a -0.784279536
3: a 0.202456288
4: a 0.609876241
5: a 2.223529739
6: b -1.366937074
7: b -0.441838791
8: b -0.085876995
9: b 0.782686277
10: b 2.236468703
Essentially, the output from ddply
and is aggregate
in wide format, and the output from data.table
is in long format. Is the answer reformatting the data, or some additional arguments for my object data.table
?
source to share
Try enforcing the list:
> x[, as.list(quantile(value)), by=g]
g 0% 25% 50% 75% 100%
1: a -1.7507334 -0.632331909 0.07435249 0.7459778 1.428552
2: b -2.2043481 -0.005652353 0.10534325 0.5769475 1.241754
3: c -1.9313985 -1.120737610 -0.26116926 0.6953009 1.360017
4: d -0.7434664 -0.055232431 0.22062823 1.1864389 3.021124
5: e -2.0101657 -0.468674094 0.20209610 0.6286448 2.433152
source to share