R an efficient way to use values ​​as indices

I have 10M lines matrix

with integer values

The line in this matrix

might look like this:

1 1 1 1 2

      

I need to convert the above string to the following vector:

4 1 0 0 0 0 0 0 0

      

Another example:

1 2 3 4 5

      

To:

1 1 1 1 1 0 0 0 0

      

How to do it effectively in R

 

Update: There is a function that does exactly what I need: base::tabulate

(suggested here earlier) but it is extremely slow (takes at least 15 minutes to go through my init matrix)

+3


source to share


1 answer


I would try something like this:

m <- nrow(x)
n <- ncol(x)
i.idx <- seq_len(m)
j.idx <- seq_len(n)

out <- matrix(0L, m, max(x))

for (j in j.idx) {
   ij <- cbind(i.idx, x[, j])
   out[ij] <- out[ij] + 1L
} 

      

Cycle

A for

may seem surprising for a question that requires effective implementation. However, this solution is vectorized for the given column and traverses only five columns. This will be many, many times faster than looping over 10 million lines with apply

.



Testing with:

n <- 1e7
m <- 5
x <- matrix(sample(1:9, n*m, T), n ,m)

      

this approach takes less than six seconds and the naive one t(apply(x, 1, tabulate, 9))

takes about two minutes.

+2


source







All Articles