Best iterative way to calculate the fundamental matrix of an absorbing Markov chain?

I have a very large absorbing Markov chain. I want to get the fundamental matrix of this chain in order to calculate the expected number of steps before a paragraph . From this question I know that it can be calculated by the equation

(I - Q) t = 1

which can be obtained using the following python code:

def expected_steps_fast(Q):
    I = numpy.identity(Q.shape[0])
    o = numpy.ones(Q.shape[0])
    numpy.linalg.solve(I-Q, o)

      

However, I would like to calculate it using a kind of iterative method similar to the cardinality iteration method used to calculate PageRank . This method would allow me to calculate an approximation to the expected number of steps before a paragraph in a system like mapreduce.

... Is there something like this?

+3


source to share


2 answers


If you have a sparse matrix check if scipy.spare.linalg.spsolve works . There are no guarantees about numerical stability, but, at least for trivial examples, it is significantly faster than the dense matrix solution.

import networkx as nx
import numpy as np
import scipy.sparse as sp
import scipy.sparse.linalg as spla

def example(n):
    """Generate a very simple transition matrix from a directed graph
    """
    g = nx.DiGraph()
    for i in xrange(n-1):
        g.add_edge(i+1, i)
        g.add_edge(i, i+1)
    g.add_edge(n-1, n)
    g.add_edge(n, n)
    m = nx.to_numpy_matrix(g)
    # normalize rows to ensure m is a valid right stochastic matrix
    m = m / np.sum(m, axis=1)
    return m

A = sp.csr_matrix(example(2000)[:-1,:-1])
Ad = np.array(A.todense())

def sp_solve(Q):
    I = sp.identity(Q.shape[0], format='csr')
    o = np.ones(Q.shape[0])
    return spla.spsolve(I-Q, o)

def dense_solve(Q):
    I = numpy.identity(Q.shape[0])
    o = numpy.ones(Q.shape[0])
    return numpy.linalg.solve(I-Q, o)

      

Timing for a sparse solution:

%timeit sparse_solve(A)
1000 loops, best of 3: 1.08 ms per loop

      



Timeline for a tight solution:

%timeit dense_solve(Ad)
1 loops, best of 3: 216 ms per loop

      

As Tobias mentions in the comments, I expected other solvers to be superior to the common one and they can be used for very large systems. For this toy example, the generic option seems to work well enough.

0


source


I got this answer thanks to @ tobias-ribizel suggesting to use the Neumann series . If we part with the following equation:

t = (IQ) ^ - 1 1

Using the Neumann series:

t = sum_0_inf (Q ^ k) 1

If we multiply each term of the series by vector 1 , we could work separately for each row of the matrix Q and approximately sequentially:



t = sum_0_inf (Q * Q ^ k-1 * 1)

This is the python code I am using to calculate this:

def expected_steps_iterative(Q, n=10):
    N = Q.shape[0]
    acc = np.ones(N)
    r_k_1 = np.ones(N)
    for k in range(1, n):
        r_k = np.zeros(N)
        for i in range(N):
            for j in range(N):
                r_k[i] += r_k_1[j] * Q[i, j]
        if np.allclose(acc, acc+r_k, rtol=1e-8):
            acc += r_k
            break
        acc += r_k
        r_k_1 = r_k
    return acc

      

And this is the code using Spark. This code expects Q to be an RDD, where each row is a tuple (row_id, a dict of the weights for that row of the matrix).

def expected_steps_spark(sc, Q, n=10):
    def dict2np(d, sz):
        vec = np.zeros(sz)
        for k, v in d.iteritems():
            vec[k] = v
        return vec
    sz = Q.count()
    acc = np.ones(sz)
    x = {i:1.0 for i in range(sz)}
    for k in range(1, n):
        bc_x = sc.broadcast(x)
        x_old = x
        x = Q.map(lambda (u, ol): (u, reduce(lambda s, j: s + bc_x.value[j]*ol[j], ol, 0.0)))
        x = x.collectAsMap()
        v_old = dict2np(x_old, sz)
        v = dict2np(x, sz)
        acc += v
        if np.allclose(v, v_old, rtol=1e-8):
            break
    return acc

      

0


source







All Articles