<?php
for ($x = 0; $x <= 25; $x++) {
$ch = curl_init("https://uk.trustpilot.com/review/example.com?languages=all&page=$x");
//curl_setopt($ch, CURLOPT_POST, true);
//curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
//curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 30); //timeout in seconds
$trustpilot = curl_exec($ch);
// Check if any errorccurred
if(curl_errno($ch))
{
die('Fatal Error Occoured');
}
}
?>
This code will get all 25 pages of reviews for example.com, what I then want to do is then put all the results into a JSON array or something.
I attempted the code below in order to maybe retrieve all of the names:
<?php
$trustpilot = preg_replace('/\s+/', '', $trustpilot); //This replaces any spaces with no spaces
$first = explode( '"name":"' , $trustpilot );
$second = explode('"' , $first[1] );
$result = preg_replace('/[^a-zA-Z0-9-.*_]/', '', $second[0]); //Don't allow special characters
?>
This is clearly a lot harder than I anticipated, does anyone know how I could possibly get all of the reviews into JSON or something for however many pages I choose, for example in this case I choose 25 pages worth of reviews.
Thanks!
do not parse HTML with regex.
use DOMDocument & DOMXPath to parse em. also, you create a new curl handle for each page, but you never close them, which is a resource/memory leak in your code, but also a waste of cpu because you could just keep re-using the same curl handle over and over (instead of creating a new curl handle for each page, which takes cpu), and protip: this html compress rather well, so you should use CURLOPT_ENCODING to download the pages compressed, e.g:
output:
because there is only 1 review here for the url you listed. and
4d6bbf8a0000640002080bc2
is the website's internal id (probably a sql db id) for that review.