Draw a chain of Markov chains in the transition matrix R

Let be the trans_m

transition matrix n

for n

first-order mark chains. My problem is n

large, say 10,000, and the matrix trans_m

is a sparse matrix built from a batch Matrix

. Otherwise, the size trans_m

will be huge. My goal is to simulate the sequence of the chain of marks given by the vector of initial states s1

and this transition matrix trans_m

. Consider the following specific example.

    n <- 5000 # there are 5,000 states in this case.
    trans_m <- Matrix(0, nr = n, nc = n, sparse = TRUE)
    K <- 5 # the maximal number of states that could be reached.
    for(i in 1:n){
        states_reachable <- sample(1:n, size = K) # randomly pick K states that can be reached with equal probability.
        trans_m[i, states_reachable] <- 1/K
    }
    s1 <- sample(1:n, size = 1000, replace = TRUE) # generate 1000 inital states
    draw_next <- function(s) {
        .s <- sample(1:n, size = 1, prob = trans_m[s, ]) # given the current state s, draw the next state .s
        .s
    }
    sapply(s1, draw_next)

      

Given the initial state vector s1

as above, I used sapply(s1, draw_next)

to draw the next state. When n

more, sapply

it becomes slow. Is there a better way?

+3


source to share


1 answer


Repeatedly indexing by row can be slow, so it is faster to work with transposition of the transition matrix and use column indexing and also to exclude indexing from the inner function:

R>    trans_m_t <- t(trans_m)
R>
R>    require(microbenchmark)
R>    microbenchmark(
+       apply(trans_m_t[,s1], 2,sample, x=n, size=1, replace=F)
+     ,
+       sapply(s1, draw_next)
+     )
Unit: milliseconds
                                                            expr        min
 apply(trans_m_t[, s1], 2, sample, x = n, size = 1, replace = F) 111.828814
                                           sapply(s1, draw_next) 499.255402
          lq        mean      median          uq        max neval
 193.1139810 190.4379185 194.6563380 196.4273105 270.418189   100
 503.7398805 512.0849013 506.9467125 516.6082480 586.762573   100

      



Since you are already working with a sparse matrix, you can get even better performance by working directly on triplets. Higher-level matrix operators can cause recompression.

+1


source







All Articles