How to prepare test data for textsum?

344 Views Asked by KRW4 At 02 May 2018 at 01:19

I have been able to successfully run the pre-trained model of TextSum (Tensorflow 1.2.1). The output consists of summaries of CNN & Dailymail articles (which are chuncked into bin format prior to testing).

I have also been able to create the aforementioned bin format test data for CNN/Dailymail articles & vocab file (per instructions here). However, I am not able to create my own test data to check how good the summary is. I have tried modifying the make_datafiles.py code to remove had coded values. I am able to create tokenized files, but the next step seems to be failing. It'll be great if someone can help me understand what url_lists is being used for. Per the github readme -

"For each of the url lists all_train.txt, all_val.txt and all_test.txt, the corresponding tokenized stories are read from file, lowercased and written to serialized binary files train.bin, val.bin and test.bin. These will be placed in the newly-created finished_files directory."

How is a URL such as http://web.archive.org/web/20150401100102id_/http://www.cnn.com/2015/04/01/europe/france-germanwings-plane-crash-main/ being mapped to the corresponding story in my data folder? If someone has had success with this, please do let me know how to go about this. Thanks in advance!

Original Q&A

There are 1 best solutions below

KRW4 On 03 May 2018 at 17:48 BEST ANSWER

Update: I was able to figure out how to use own data to create bin files for testing (and avoid using url_lists altogether).

This will be helpful - https://github.com/dondon2475848/make_datafiles_for_pgn

Will update answer once I figure out how to fix ROGUE scoring for this.

How to prepare test data for textsum?

There are 1 best solutions below

Related Questions in PYTHON-3.X

Related Questions in TENSORFLOW

Related Questions in NLP

Related Questions in PRE-TRAINED-MODEL

Related Questions in TEXTSUM

Trending Questions

Popular # Hahtags

Popular Questions