When using Keras + TF 1.14 (Non eager) is it possible to train an LSTM step by step to be more memory efficient?
Coming from PyTorch, I could pass the sequence step by step, do back propagation and use the hidden state for the next step without stopping the gradient in time (as long as there is enough memory).
Currently, the LSTM is trained by loading an entire truncated time sequence and processed using Keras TimeDistribute.
Assuming the network is deep after the LSTM, the PyTorch version should require a lot less memory, as we only have to store the intermediate results for one time step for parts after the LSTM. (When using the same truncated sequence length)
How is a similar behaviour possible with Keras?