I have a simple query regarding the Wikimedia/Wikipedia API.
I have to fetch the changes made from a list of "revids". I am able to fetch the XML content for a batch of "revids", but I failed to extract only the changed text.
Does API provide any way to extract only the changed sentences? If not any external script/module that can do this job?
Query to fetch the revision details: https://en.wikipedia.org/w/api.php?action=query&prop=info|revisions&rvprop=user|userid|ids|tags|comment|content&format=jsonfm&revids=1228415
I would appreciate any suggestions/solutions that could solve this issue!
(Currently, I am using the Wikitools python module to make the queries)
You can get the diff between the old and new text with
action=compare
, but it segments text by wikitext lines, not sentences, isn't meant to be machine-readable, and is generally not that helpful. Since you are using Python, the client-side library deltas will probably work better for you.