Keras LSTM state vs. feed forward network with sliding window

1.1k Views Asked by At

In the default mode (stateful = False) in Keras' LSTM implementation, all samples in a batch are independent and the state is not propagated from one sample to the next. As per my understanding, input sequence length (L) is the only way to have the LSTM maintain state. But this restricts the state propagation to a fixed number of time steps i.e. L. Theoretically, what advantage will this mode of operation have as compared to a feed forward NN with a fixed size sliding input window. So that each input to the NN is a vector of L consecutive input values.

In theory, LSTMs should be able to learn long range dependencies spanning even 1000 time steps. But doesn't this require me to have L = 1000, as there is no way to capture dependencies longer than the input sequence length? I know that one can use the stateful mode by formatting the input data such that i-th sample of each batch is dependent. I am having a hard time understanding what advantage does the default LSTM mode have over a feed forward NN with a sliding window over input data?

1

There are 1 best solutions below

0
On

The main difference between a Feed Forward NN (FFNN) and any Recurent Net (RNN, LSTM...) is the presence of recurrent connection through time.

Using a FFNN with sliding windows might get you somewhere but your internal representation will only be based on input at time "t". Whereas Recurent Net will also make use of previously seen data.

Stateless vs Statefull : I won't go into the detail there already are a lot of good posts on SO covering this topics, what's important though is that states reset between batches not after each sample so it does carry some information across beyond the sequence length that a regular FFNN won't.