In a previous question I posted very recently on Stack Overflow, How to bulk download files from the internet archive, I thought I figured out a way to resolve my problem by using minimal number of commands posted on the Internet archive help blog, as a reminder, here is their version of commands posted on their blog:
wget -r -H -nc -np -nH --cut-dirs=1 -A .pdf,.epub -e robots=off -l1 -i ./itemlist.txt -B 'http://archive.org/download/'
against my own version of commands :
wget --cut-dirs=1 -A .pdf,.epub -e robots=off -i ./itemlist.txt -B 'http://archive.org/download/'
The commands run so well, but I have instead my desired pdf and epub files random numerical file extensions such as:
arxiv-1411.7162
arxiv-1412.0666
arxiv-1410.8703
etc... After opening these files with a text editor I figured out they were instead html files. I ran some of these files with a local browser, and a web page indicating the link of my desired PDF download at the bottom of these pages. I want to know how could I extract these files automatically without manual interference, and mention if should I change my previous steps.