Matrix Averaging repeated rows

Question

Matrix Averaging repeated rows

I have a cvs file with 2 columns (see below for example Matrix 1). I would like to create a program to average the second column of a matrix for all duplicate numbers in the first column. For example, in the matrix below, there are two rows "2" in the first column. These rows will be averaged into one column ((356 + 456) / 2 = 406), etc. So the final matrix would have to have matrix 2 at the bottom. Any ideas how to do this?

Matrix 1

mat1 <- structure(c(1, 2, 2, 3, 4, 4, 4, 5, 234, 356, 456, 745, 568, 
            998, 876, 895), .Dim = c(8L, 2L))
mat1
     [,1] [,2]
[1,]    1  234
[2,]    2  356
[3,]    2  456
[4,]    3  745
[5,]    4  568
[6,]    4  998
[7,]    4  876
[8,]    5  895

Matrix 2

mat2 <- structure(c(1, 2, 3, 4, 5, 234, 406, 745, 814, 895), .Dim = c(5L, 2L))
mat2
     [,1] [,2]
[1,]    1  234
[2,]    2  406
[3,]    3  745
[4,]    4  814
[5,]    5  895

+3

r

CPL 24 nov. 14 at 19:06

source to share

5 answers

using only R base:

> x <- tapply(mat1[,2], mat1[,1], mean)
> matrix(c(as.integer(names(x)), x), ncol = 2)

+2

mmuurr 24 nov. 14 at 19:20

source to share

The easiest way is to use tapply

:

tapply(mat1[,2], mat1[,1], mean)

+1

Thilo 24 nov. 14 at 19:17

source to share

If the first column is always in numerical order, you can try

cbind(unique(mat1[,1]), rowsum(mat1[,2], mat1[,1]) %/% matrix(table(mat1[,1])))
#      [,1] [,2]
# [1,]    1  234
# [2,]    2  406
# [3,]    3  745
# [4,]    4  814
# [5,]    5  895

rowsum

is known to be more effective than aggregate

and tapply

. However, there are obvious limitations. It would be nice if there was a function rowmean

to calculate grouped matrices.

Another basic R capability is

s <- unname(split(mat1[,2], mat1[,1]))
cbind(unique(mat1[,1]), vapply(s, mean, 1))
#      [,1] [,2]
# [1,]    1  234
# [2,]    2  406
# [3,]    3  745
# [4,]    4  814
# [5,]    5  895

And a safer solution to these three would be to convert to a data frame. Here I am using dplyr

for efficiency.

library(dplyr)
df <- group_by(as.data.frame(mat1), V1) %>% summarise(mean(V2))
as.matrix(unname(df))
#      [,1] [,2]
# [1,]    1  234
# [2,]    2  406
# [3,]    3  745
# [4,]    4  814
# [5,]    5  895

+1

Rich scriven 24 nov. 14 at 19:53

source to share

The answer from @LeoRJorge is 98% of the way to the desired result, just needs to be unnamed (if really needed):

unname(as.matrix(aggregate(mat1[,2], list(mat1[,1]), mean)))

     [,1] [,2]
[1,]    1  234
[2,]    2  406
[3,]    3  745
[4,]    4  814
[5,]    5  895

+1

goangit 25 nov. 14 at 4:15 am

source to share

LeoRJorge · Accepted Answer · 2014-11-24T19:17:41+0000

What about

as.matrix(aggregate(mat1[,2],by = list(mat1[,1]),FUN = mean))

Matrix Averaging repeated rows

More articles: