Taking common means based on column splitting
I have a matrix mat
and would like to calculate the average of the columns based on the grouping variable gp
.
mat<-embed(1:5000,1461)
gp<-c(rep(1:365,each=4),366)
For this I use the following
colavg<-t(aggregate(t(mat),list(gp),mean))
But it takes much longer than I expect.
Any suggestions for speeding up your code execution?
+3
source to share
2 answers
Here is a quick algorithm that I commented out in the code.
system.time({
# create a list of column indices per group
gp.list <- split(seq_len(ncol(mat)), gp)
# for each group, compute the row means
means.list <- lapply(gp.list, function(cols)rowMeans(mat[,cols, drop = FALSE]))
# paste everything together
colavg <- do.call(cbind, means.list)
})
# user system elapsed
# 0.08 0.00 0.08
+2
source to share
You can use the apply function, for example, from the excellent package plyr
:
# Create data
mat<-embed(1:5000,1461)
gp<-c(rep(1:365,each=4),366)
# Your code
system.time(colavg<-t(aggregate(t(mat),list(gp),mean)))
library(plyr)
# Put all data in a data frame
df <- data.frame(t(mat))
df$gp <- gp
# Using an apply function
system.time(colavg2 <- t(daply(df, .(gp), colMeans)))
Output:
> # Your code
> system.time(colavg<-t(aggregate(t(mat),list(gp),mean)))
user system elapsed
134.21 1.64 139.00
> # Using an apply function
> system.time(colavg2 <- t(daply(df, .(gp), colMeans)))
user system elapsed
52.78 0.06 53.23
+1
source to share