Keras LSTM status versus sliding window direct network

In the default mode (stateful = False) in the KST LSTM implementation, all samples in the package are independent and state does not propagate from one sample to another. In my opinion, the length of the input sequence (L) is the only way to maintain the state of the LSTM. But this limits the propagation of the state to a fixed number of time steps, that is L. Theoretically, what is the advantage of this mode of operation over NN feed forward with a fixed sliding position with a fixed size. Thus, each input to NN is a vector of L consecutive inputs.

In theory, LSTMs should be able to study long range dependencies, spanning even 1000 time steps. But doesn't that require me to have L = 1000, since there is no way to capture dependencies longer than the length of the input sequence? I know it is possible to use stateful mode by formatting the input so that the i-th sample of each batch is dependent. I'm having a hard time figuring out what is the advantage of the default LSTM mode for NN forward feed with a sliding window across the input?

+3


source to share





All Articles