R an efficient way to use values as indices

Question

R an efficient way to use values as indices

I have 10M lines matrix

with integer values

The line in this matrix

might look like this:

1 1 1 1 2

I need to convert the above string to the following vector:

4 1 0 0 0 0 0 0 0

Another example:

1 2 3 4 5

To:

1 1 1 1 1 0 0 0 0

How to do it effectively in R

Update: There is a function that does exactly what I need: base::tabulate

(suggested here earlier) but it is extremely slow (takes at least 15 minutes to go through my init matrix)

+3

performance r

YevgenyM 12 nov. 14 at 13:22

source to share

1 answer

flodel · Accepted Answer · 2014-11-12T13:46:32+0000

I would try something like this:

m <- nrow(x)
n <- ncol(x)
i.idx <- seq_len(m)
j.idx <- seq_len(n)

out <- matrix(0L, m, max(x))

for (j in j.idx) {
   ij <- cbind(i.idx, x[, j])
   out[ij] <- out[ij] + 1L
}

Cycle

A for

may seem surprising for a question that requires effective implementation. However, this solution is vectorized for the given column and traverses only five columns. This will be many, many times faster than looping over 10 million lines with apply

.

Testing with:

n <- 1e7
m <- 5
x <- matrix(sample(1:9, n*m, T), n ,m)

this approach takes less than six seconds and the naive one t(apply(x, 1, tabulate, 9))

takes about two minutes.

R an efficient way to use values ​​as indices

More articles:

R an efficient way to use values as indices