# Taking common means based on column splitting

I have a matrix `mat`

and would like to calculate the average of the columns based on the grouping variable `gp`

.

``````mat<-embed(1:5000,1461)
gp<-c(rep(1:365,each=4),366)
```

```

For this I use the following

``````colavg<-t(aggregate(t(mat),list(gp),mean))
```

```

But it takes much longer than I expect.

Any suggestions for speeding up your code execution?

+3

source to share

Here is a quick algorithm that I commented out in the code.

``````system.time({

# create a list of column indices per group
gp.list    <- split(seq_len(ncol(mat)), gp)

# for each group, compute the row means
means.list <- lapply(gp.list, function(cols)rowMeans(mat[,cols, drop = FALSE]))

# paste everything together
colavg     <- do.call(cbind, means.list)

})
#    user  system elapsed
#    0.08    0.00    0.08
```

```
+2

source

You can use the apply function, for example, from the excellent package `plyr`

:

``````# Create data
mat<-embed(1:5000,1461)
gp<-c(rep(1:365,each=4),366)

system.time(colavg<-t(aggregate(t(mat),list(gp),mean)))

library(plyr)
# Put all data in a data frame
df <- data.frame(t(mat))
df\$gp <- gp

# Using an apply function
system.time(colavg2 <- t(daply(df, .(gp), colMeans)))
```

```

Output:

``````> # Your code
> system.time(colavg<-t(aggregate(t(mat),list(gp),mean)))
user  system elapsed
134.21    1.64  139.00

> # Using an apply function
> system.time(colavg2 <- t(daply(df, .(gp), colMeans)))
user  system elapsed
52.78    0.06   53.23
```

```
+1

source

All Articles