How to bulk download files from the internet archive [REVIEW]

2.5k Views Asked by MyDoom At 27 March 2020 at 10:01

In a previous question I posted very recently on Stack Overflow, How to bulk download files from the internet archive, I thought I figured out a way to resolve my problem by using minimal number of commands posted on the Internet archive help blog, as a reminder, here is their version of commands posted on their blog:

wget -r -H -nc -np -nH --cut-dirs=1 -A .pdf,.epub -e robots=off -l1 -i ./itemlist.txt -B 'http://archive.org/download/'

against my own version of commands :

wget --cut-dirs=1 -A .pdf,.epub -e robots=off -i ./itemlist.txt -B 'http://archive.org/download/'

The commands run so well, but I have instead my desired pdf and epub files random numerical file extensions such as:

arxiv-1411.7162
arxiv-1412.0666
arxiv-1410.8703

etc... After opening these files with a text editor I figured out they were instead html files. I ran some of these files with a local browser, and a web page indicating the link of my desired PDF download at the bottom of these pages. I want to know how could I extract these files automatically without manual interference, and mention if should I change my previous steps.

Original Q&A

How to bulk download files from the internet archive [REVIEW]

There are 0 best solutions below

Related Questions in CYGWIN

Related Questions in EMULATION

Related Questions in WGET

Related Questions in BULKLOADER

Trending Questions

Popular # Hahtags

Popular Questions