Where can we get Wikipedia dumps for each year from 2010?

1.2k Views Asked by At

I was wondering if there are English Wikipedia (enwiki) page article dumps for previous years (2010-2019).

The data dump torrents at https://meta.wikimedia.org/wiki/Data_dump_torrents don't seem to have dumps for years prior to 2017.

2

There are 2 best solutions below

1
On

The Internet Archive has some old dumps, and the dumps page has information about some really old dumps.

4
On

Actually, you don't need them! If you need the history of pages, just download a dump with history in the name. They have all revisions since Wikipedia was born. You would have to parse the wikitext to get the metadata through - on the other hand, such data would be probably more reliable for any research or practical use than old dumps.

Which dump exactly you need to download depends on what is your use case. Do you want revision metadata only, and look which users contributed when? stub-meta-history.xml is your way to go. Do you want to have page content, and parse all of that? pages-meta-history would be your choice. However, if you want to parse those dumps with enwiki, it's really big, about 14 TiB in May 2016, per https://meta.wikimedia.org/wiki/Data_dumps/FAQ#How_big_are_the_en_wikipedia_dumps_uncompressed?, as it contains all of Wikipedia, including history.