I am working on tensorflow's textsum (text summarization model). I have put it on run to train the model with the sample data i.e. toy dataset provided with the model while cloning from git. I wanted to know that how much time it will take to train the model and decode with the sample dataset? It has already taken more than 17hrs and still running.
How much time it will take to train with sample data(toy sample data) for tensorflow textsum?
3.8k Views Asked by Kajal Kodrani At
2
There are 2 best solutions below
0
On
On my i5 processor, using only cpu, it took about 60 hours to reach to a value of 0.17 for the toy training dataset.
Using 8gb of ram it consumed an extra memory of about 10gb of additional swap. Increased ram and use of GPU might have provided better results. Presently I am unable to show an image of running average loss from tensorboard, but I hope your query has been answered.
Unfortunately with the toy data training set, it is only meant to provide you a means to watch the overall flow of the model and not meant to provide you decent results. This is because there is just not enough data provided int he toy dataset to provide good results.
Amount of time is kind of difficult to provide as it is all relative to the hardware you are running on. So you are normally going to train until you get to about an average loss between 2 and 1. Xin Pan stated with larger datasets you should never go below 1.0 avg loss. So on my 980M I was able to get this in less than a day with the toy dataset.
That said, my results were really bad and I thought there was something wrong. I found that the only thing wrong was I didn't have enough data. I then scraped about 40k articles and still, the results were not acceptable. Recently I have trained against 1.3 million articles and the results are so much better. After further analysis, it is primarily due to the textsum model being abstractive rather than extractive.
Hope this somewhat helps. For the 1.3 million and batch set to 64, I was able to train the model on my hardware in less than a week and a half using TF 0.9, cuda 7.5 and cudnn 4. I hear the new cudnn/cuda are supposed to be faster, but I can't speak to that as of yet.