Matrix Averaging repeated rows

I have a cvs file with 2 columns (see below for example Matrix 1). I would like to create a program to average the second column of a matrix for all duplicate numbers in the first column. For example, in the matrix below, there are two rows "2" in the first column. These rows will be averaged into one column ((356 + 456) / 2 = 406), etc. So the final matrix would have to have matrix 2 at the bottom. Any ideas how to do this?

Matrix 1

mat1 <- structure(c(1, 2, 2, 3, 4, 4, 4, 5, 234, 356, 456, 745, 568, 
            998, 876, 895), .Dim = c(8L, 2L))
mat1
     [,1] [,2]
[1,]    1  234
[2,]    2  356
[3,]    2  456
[4,]    3  745
[5,]    4  568
[6,]    4  998
[7,]    4  876
[8,]    5  895

      

Matrix 2

mat2 <- structure(c(1, 2, 3, 4, 5, 234, 406, 745, 814, 895), .Dim = c(5L, 2L))
mat2
     [,1] [,2]
[1,]    1  234
[2,]    2  406
[3,]    3  745
[4,]    4  814
[5,]    5  895

      

+3


source to share


5 answers


What about



as.matrix(aggregate(mat1[,2],by = list(mat1[,1]),FUN = mean))

      

+1


source


using only R base:



> x <- tapply(mat1[,2], mat1[,1], mean)
> matrix(c(as.integer(names(x)), x), ncol = 2)

      

+2


source


The easiest way is to use tapply

:

tapply(mat1[,2], mat1[,1], mean)

      

+1


source


If the first column is always in numerical order, you can try

cbind(unique(mat1[,1]), rowsum(mat1[,2], mat1[,1]) %/% matrix(table(mat1[,1])))
#      [,1] [,2]
# [1,]    1  234
# [2,]    2  406
# [3,]    3  745
# [4,]    4  814
# [5,]    5  895

      

rowsum

is known to be more effective than aggregate

and tapply

. However, there are obvious limitations. It would be nice if there was a function rowmean

to calculate grouped matrices.

Another basic R capability is

s <- unname(split(mat1[,2], mat1[,1]))
cbind(unique(mat1[,1]), vapply(s, mean, 1))
#      [,1] [,2]
# [1,]    1  234
# [2,]    2  406
# [3,]    3  745
# [4,]    4  814
# [5,]    5  895

      

And a safer solution to these three would be to convert to a data frame. Here I am using dplyr

for efficiency.

library(dplyr)
df <- group_by(as.data.frame(mat1), V1) %>% summarise(mean(V2))
as.matrix(unname(df))
#      [,1] [,2]
# [1,]    1  234
# [2,]    2  406
# [3,]    3  745
# [4,]    4  814
# [5,]    5  895

      

+1


source


The answer from @LeoRJorge is 98% of the way to the desired result, just needs to be unnamed (if really needed):

unname(as.matrix(aggregate(mat1[,2], list(mat1[,1]), mean)))

     [,1] [,2]
[1,]    1  234
[2,]    2  406
[3,]    3  745
[4,]    4  814
[5,]    5  895

      

+1


source







All Articles