Wikipedia revision history using pywikibot: size of revision (follow-up)

193 Views Asked by At

I would like to follow up on this thread: Wikipedia revision history using pywikibot

I'm trying to get the "size of revision" (number of bytes changed) variable for my list of Wikipedia page revisions with pywikibot 5.2.0, but don't find the corresponding variable in my output.

Here is how I have defined my function:

def get_page_revisions(page, site=pywikibot.Site("pl", "wikipedia")):
    return pywikibot.Page(site, page).revisions(content=False)

I import an R file with a vector of 3,245 strings (titles of Wikipedia pages) to my Python editor and run the get_page_revisions function which returns 12 columns and 379,426 revision observations. I thus get the following columns: "revid, _text, timestamp, user, anon, comment, minor, rollbacktoken, _parent_id, _content_model, _sha1, slots".

I don't obtain the size information. What am I doing wrong?

Thank you!

1

There are 1 best solutions below

0
On

Page.revisions() gives a generator object. Run through the generator to get a Revision object which holds the revision content convert the generator to a list of these object. Each Revision object has several Content items including 'size'. For example:

import pywikibot
site = pywikibot.Site('Wikipedia:pl')
page = pywikibot.Page(site, 'Foo')
rgen = page.revisions()
rev = next(rgen)

To get the size use

rev.size

or

rev['size']

All provides items can be shown for example with:

list(rev.keys())

and should show:

['revid', 'parentid', 'user', 'userid', 'timestamp', 'size', 'sha1', 'roles',
 'slots', 'comment', 'parsedcomment', 'tags', 'anon', 'minor', 'userhidden', 'text',
 'contentmodel']