How to interpret this loss curve of textsum model?

198 Views Asked by At

I have been training the textsum seq2seq w/attention model for abstractive summarization on a training corpus of 600k articles + abstracts. Can this be regarded convergence? If so, can it be right that it converged after less than say 5k steps? Considerations:

  • I've trained on a vocab size of 200k
  • 5k steps (until approx convergence) with a batch size of 4 means that at most 20k different samples were seen. This is only a fraction of the entire training corpus.

Or am I actually not reading my dog's face in the tea leaves and is the marginal negative slope as expected?

Loss over steps

1

There are 1 best solutions below

0
anthnyprschka On

Ok so I actually switched to training on a GPU (instead of a CPU) and proved that the model was still learning. Here is the learning curve after initializing a completely new model: enter image description here

Speedup was roughly 30x training with AWS p2.xlarge NVIDIA K80.