Is there any better way to fetch text contents of particular sections from wikipedia. I have the below code to skip some sections but the process is taking too long to fetch data what am looking for.
for($i=0;$i>10;$i++){
if($i != 2 || $i != 4){
$url = 'http://en.wikipedia.org/w/api.php?action=parse&page=ramanagara&format=json&prop=text§ion='.$i;
$ch = curl_init($url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_USERAGENT, "TestScript");
$c = curl_exec($ch);
$json = json_decode($c);
$content = $json->{'parse'}->{'text'}->{'*'};
print preg_replace('/<\/?a[^>]*>/','',$content);
}
}
For starters, you're telling this to loop until
$i
is greater than10
, which in practice, will loop until the server request times out. Change it to$i<10
, or if you need only a handful of sections, try:Second, decoding JSON into an associative array like this:
$json = json_decode($c, true);
And referencing it like
$json['parse']['text']['*']
is easier to work with, but that's up to you.And third, you'll find that
strip_tags()
will likely function faster and more accurately than stripping tags with regular expressions.