php goutte web scraper gives different results from time to time

261 Views Asked by At

I'm trying to build a web scraper for amazon's product page.

I decided to go with Goutte library (based on this freeCodeCamp tutorial).

Here's what I've coded so far:

<?php
    require 'vendor/autoload.php';

    $link = readline('Enter the product you want to scrape: ');

    $httpClient = new \Goutte\Client();
    $response = $httpClient->request('GET', $link);
    
    $title_ = $response->evaluate('//span[@id="productTitle"]');
    $price_ = $response->evaluate('//span[@class="priceBlockStrikePriceString a-text-strike"]');
    $offer_ = $response->evaluate('//span[@id="priceblock_ourprice"]');

    foreach ($title_ as $key => $title) {
        $str = $title->textContent . PHP_EOL;
        $str = str_replace("\n","",$str);
        echo $str , "\n";
    }

    foreach ($price_ as $key => $price) {
        $str = $price->textContent . PHP_EOL;
        $str = str_replace(array("\n", " "),"",$str);
        echo $str , "\n";
    }

    foreach ($offer_ as $key => $offer) {
        $str = $offer->textContent . PHP_EOL;
        $str = str_replace(array("\n", " "),"",$str);
        echo $str , "\n";
    }
?>

As you can see I'm trying to extract the product's title, listed price and Amazon's offer.

What really frustrates me, is that the above code sometimes works and sometimes it doesn't.

For example, I'm testing it by giving it this link.

Sometimes I get the desired result:

Amazon Basics Liquid Crystal Clear Soft TPU Smartphone Cover for iPhone 13 Pro Max
$9.99
$6.69

But then I'm trying again, without changing anything, and I only get the product's title:

Amazon Basics Liquid Crystal Clear Soft TPU Smartphone Cover for iPhone 13 Pro Max

What's going on? I'm guessing issues with my code (I'm new to web scraping), can anyone help? Thanks in advance.

0

There are 0 best solutions below