How to interpret this loss curve of textsum model?

198 Views Asked by anthnyprschka At 15 August 2017 at 22:42

I have been training the textsum seq2seq w/attention model for abstractive summarization on a training corpus of 600k articles + abstracts. Can this be regarded convergence? If so, can it be right that it converged after less than say 5k steps? Considerations:

I've trained on a vocab size of 200k
5k steps (until approx convergence) with a batch size of 4 means that at most 20k different samples were seen. This is only a fraction of the entire training corpus.

Or am I actually not reading my dog's face in the tea leaves and is the marginal negative slope as expected?

Original Q&A

There are 1 best solutions below

anthnyprschka On 11 September 2017 at 09:49

Ok so I actually switched to training on a GPU (instead of a CPU) and proved that the model was still learning. Here is the learning curve after initializing a completely new model:

Speedup was roughly 30x training with AWS p2.xlarge NVIDIA K80.

How to interpret this loss curve of textsum model?

There are 1 best solutions below

Related Questions in TENSORFLOW

Related Questions in DEEP-LEARNING

Related Questions in SUMMARIZATION

Related Questions in RECURRENT-NEURAL-NETWORK

Related Questions in TEXTSUM

Trending Questions

Popular # Hahtags

Popular Questions