Draw a chain of Markov chains in the transition matrix R
Let be the trans_m
transition matrix n
for n
first-order mark chains. My problem is n
large, say 10,000, and the matrix trans_m
is a sparse matrix built from a batch Matrix
. Otherwise, the size trans_m
will be huge. My goal is to simulate the sequence of the chain of marks given by the vector of initial states s1
and this transition matrix trans_m
. Consider the following specific example.
n <- 5000 # there are 5,000 states in this case.
trans_m <- Matrix(0, nr = n, nc = n, sparse = TRUE)
K <- 5 # the maximal number of states that could be reached.
for(i in 1:n){
states_reachable <- sample(1:n, size = K) # randomly pick K states that can be reached with equal probability.
trans_m[i, states_reachable] <- 1/K
}
s1 <- sample(1:n, size = 1000, replace = TRUE) # generate 1000 inital states
draw_next <- function(s) {
.s <- sample(1:n, size = 1, prob = trans_m[s, ]) # given the current state s, draw the next state .s
.s
}
sapply(s1, draw_next)
Given the initial state vector s1
as above, I used sapply(s1, draw_next)
to draw the next state. When n
more, sapply
it becomes slow. Is there a better way?
source to share
Repeatedly indexing by row can be slow, so it is faster to work with transposition of the transition matrix and use column indexing and also to exclude indexing from the inner function:
R> trans_m_t <- t(trans_m)
R>
R> require(microbenchmark)
R> microbenchmark(
+ apply(trans_m_t[,s1], 2,sample, x=n, size=1, replace=F)
+ ,
+ sapply(s1, draw_next)
+ )
Unit: milliseconds
expr min
apply(trans_m_t[, s1], 2, sample, x = n, size = 1, replace = F) 111.828814
sapply(s1, draw_next) 499.255402
lq mean median uq max neval
193.1139810 190.4379185 194.6563380 196.4273105 270.418189 100
503.7398805 512.0849013 506.9467125 516.6082480 586.762573 100
Since you are already working with a sparse matrix, you can get even better performance by working directly on triplets. Higher-level matrix operators can cause recompression.
source to share