Matrix Averaging repeated rows
I have a cvs file with 2 columns (see below for example Matrix 1). I would like to create a program to average the second column of a matrix for all duplicate numbers in the first column. For example, in the matrix below, there are two rows "2" in the first column. These rows will be averaged into one column ((356 + 456) / 2 = 406), etc. So the final matrix would have to have matrix 2 at the bottom. Any ideas how to do this?
Matrix 1
mat1 <- structure(c(1, 2, 2, 3, 4, 4, 4, 5, 234, 356, 456, 745, 568,
998, 876, 895), .Dim = c(8L, 2L))
mat1
[,1] [,2]
[1,] 1 234
[2,] 2 356
[3,] 2 456
[4,] 3 745
[5,] 4 568
[6,] 4 998
[7,] 4 876
[8,] 5 895
Matrix 2
mat2 <- structure(c(1, 2, 3, 4, 5, 234, 406, 745, 814, 895), .Dim = c(5L, 2L))
mat2
[,1] [,2]
[1,] 1 234
[2,] 2 406
[3,] 3 745
[4,] 4 814
[5,] 5 895
source to share
If the first column is always in numerical order, you can try
cbind(unique(mat1[,1]), rowsum(mat1[,2], mat1[,1]) %/% matrix(table(mat1[,1])))
# [,1] [,2]
# [1,] 1 234
# [2,] 2 406
# [3,] 3 745
# [4,] 4 814
# [5,] 5 895
rowsum
is known to be more effective than aggregate
and tapply
. However, there are obvious limitations. It would be nice if there was a function rowmean
to calculate grouped matrices.
Another basic R capability is
s <- unname(split(mat1[,2], mat1[,1]))
cbind(unique(mat1[,1]), vapply(s, mean, 1))
# [,1] [,2]
# [1,] 1 234
# [2,] 2 406
# [3,] 3 745
# [4,] 4 814
# [5,] 5 895
And a safer solution to these three would be to convert to a data frame. Here I am using dplyr
for efficiency.
library(dplyr)
df <- group_by(as.data.frame(mat1), V1) %>% summarise(mean(V2))
as.matrix(unname(df))
# [,1] [,2]
# [1,] 1 234
# [2,] 2 406
# [3,] 3 745
# [4,] 4 814
# [5,] 5 895
source to share