Extract changes from Wikipedia/Wikimedia revision pages

369 Views Asked by At

I have a simple query regarding the Wikimedia/Wikipedia API.

I have to fetch the changes made from a list of "revids". I am able to fetch the XML content for a batch of "revids", but I failed to extract only the changed text.

Does API provide any way to extract only the changed sentences? If not any external script/module that can do this job?

Query to fetch the revision details: https://en.wikipedia.org/w/api.php?action=query&prop=info|revisions&rvprop=user|userid|ids|tags|comment|content&format=jsonfm&revids=1228415

I would appreciate any suggestions/solutions that could solve this issue!

(Currently, I am using the Wikitools python module to make the queries)

1

There are 1 best solutions below

3
On

You can get the diff between the old and new text with action=compare, but it segments text by wikitext lines, not sentences, isn't meant to be machine-readable, and is generally not that helpful. Since you are using Python, the client-side library deltas will probably work better for you.