How to parse Wikipedia XML with PHP? I tried it with simplepie, but I got nothing. Here is a link which I want to get its data.
http://en.wikipedia.org/w/api.php?action=query&generator=allpages&gaplimit=2&gapfilterredir=nonredirects&gapfrom=Re&prop=revisions&rvprop=content&format=xml
Edit code:
<?php
define("EMAIL_ADDRESS", "[email protected]");
$ch = curl_init();
$cv = curl_version();
$user_agent = "curl ${cv['version']} (${cv['host']}) libcurl/${cv['version']} ${cv['ssl_version']} zlib/${cv['libz_version']} <" . EMAIL_ADDRESS . ">";
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookies.txt");
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookies.txt");
curl_setopt($ch, CURLOPT_ENCODING, "deflate, gzip, identity");
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HTTPGET, TRUE);
curl_setopt($ch, CURLOPT_URL, "http://en.wikipedia.org/w/api.php?action=query&generator=allpages&gaplimit=2&gapfilterredir=nonredirects&gapfrom=Re&prop=revisions&rvprop=content&format=xml");
$xml = curl_exec($ch);
$xml_reader = new XMLReader();
$xml_reader->xml($xml, "UTF-8");
echo $xml->api->query->pages->page->rev;
?>
I generally use a combination of CURL and
XMLReader
to parse XML generated by the MediaWiki API.Note that you must include your e-mail address in the
User-Agent
header, or else the API script will respond with HTTP 403 Forbidden.Here is how I initialize the CURL handle:
You can then use this code which grabs the XML and constructs a new
XMLReader
object in$xml_reader
:EDIT: Here is a working example: