Why should we use RNN instead of Markov models?

I recently came across this article , and I was curious to know what the difference is between the results you get from a repeating neural network like the ones described above and a simple Markov chain.

I don't really understand the linear algebra going on under the hood in RNN, but it seems that you are basically just designing a super convoluted way to make a statistical model for what the next letter will be based on the previous letters, something that is done very simply in a Markov chain ...

Why are RNNs interesting? Is this simply because they are a more generalizable solution, or is there something I am missing?

+3


source to share


1 answer


The Markov chain takes the Markov property, it is "without memory". The probability of the next symbol is calculated based on the previous symbols. In practice, k is limited to low values โ€‹โ€‹(say 3-5) because the transition matrix grows exponentially. Therefore, the proposals generated by the hidden Markov model are highly controversial.

On the other hand, RNNs (for example, with LSTM units) are not associated with the Markov property. Their rich inner state allows them to track long-term addictions.



The Karpathy blog post lists the C-source code generated by the RNN character by character. The model impressively captures the dependencies of things like open and close parentheses.

+3


source







All Articles