Count previous occurrences of a row across multiple columns in R

Question

Count previous occurrences of a row across multiple columns in R

I have a four column matrix with a chronological index and three columns of names (rows). Here are some data about the toy:

x = rbind(c(1,"sam","harry","joe"), c(2,"joe","sam","jack"),c(3,"jack","joe","jill"),c(4,"harry","jill","joe"))

I want to create three additional vectors that count (for each row) any previous (but not subsequent) occurrences of the name. Here's the desired output for the toy data:

y = rbind(c(0,0,0),c(1,1,0),c(1,2,0),c(1,1,3))

I have a hard time approaching the problem and have searched Stack Overflow for relevant examples. dplyr provides answers for finding common counters, but (as far as I can tell) not across a series of lines.

I tried to write a function to solve this problem in a single column space but no luck, i.e.

thing = sapply(x,function(i)length(grep(i,x[x[1:i]])))

Any advice would be appreciated.

+3

r sapply

toddntucker May 14 '15 at 16:33

source to share

2 answers

You can do:

el = unique(c(x[,-1]))
val = Reduce(`+`, lapply(el, function(u) {b=c(t(x[,-1]))==u; b[b==T]=(cumsum(b[b==1])-1); b}))

matrix(val, ncol=ncol(x[,-1]), byrow=T)
#         [,1] [,2] [,3]
#[1,]    0    0    0
#[2,]    1    1    0
#[3,]    1    2    0
#[4,]    1    1    3

+2

Colonel Beauvel May 14 '15 at 16:49

source to share

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2015-05-14T16:43:11+0000

This is a typical ave

+ problem seq_along

, but we need to convert the data to vectors first:

t(`dim<-`(ave(rep(1, prod(dim(x[, -1]))), 
              c(t(x[, -1])), FUN = seq_along)  - 1, 
          rev(dim(x[, -1]))))
#      [,1] [,2] [,3]
# [1,]    0    0    0
# [2,]    1    1    0
# [3,]    1    2    0
# [4,]    1    1    3

Perhaps more readable:

## x without the first column as a vector
x_vec <- c(t(x[, -1]))

## The values that you are looking to obtain...
y_vals <- ave(rep(1, length(x_vec)), x_vec, FUN = seq_along) - 1

## ... in the format you want to obtain them
matrix(y_vals, ncol = ncol(x) - 1, byrow = TRUE)
#      [,1] [,2] [,3]
# [1,]    0    0    0
# [2,]    1    1    0
# [3,]    1    2    0
# [4,]    1    1    3

Count previous occurrences of a row across multiple columns in R

More articles: