How to scape images by Goutte?

280 Views Asked by At

I tried to scrape images of a hotel on booking.com. If I need to get all images of the hotel, I can click on any image on the web page. Below is the html content

<a data-id="88516347" data-thumb-url="https://cf.bstatic.com/images/hotel/max500/885/88516347.jpg" href="https://cf.bstatic.com/images/hotel/max1024x768/885/88516347.jpg" target="_blank" class="bh-photo-grid-item bh-photo-grid-photo3 active-image " style="background-image: url(https://cf.bstatic.com/images/hotel/max500/885/88516347.jpg);" onclick="return false;" title="The sunrise or sunset as seen from the apartment or nearby ">
<img src="https://cf.bstatic.com/images/hotel/max500/885/88516347.jpg" class="hide" alt="The sunrise or sunset as seen from the apartment or nearby ">
</a>

If I click the image, it will open a modal with slideshow. Does anyone know how to fetch it by php Goutte?

I got a python sample about this. https://github.com/basophobic/booking_scraper/blob/ec4382dd00970df0dab4b4df5d67143f9bbc2b21/web_scrapping.py

# click on the first image to open the image carousel
    driver.find_element_by_class_name('bh-photo-grid-item').click()

    # find the number of images for every hotel
    tmp1 = driver.find_element_by_class_name('bh-photo-modal-caption-left').text
    tmp = tmp1.split()
    img_number = int(tmp[2])
    print(img_number)

    accommodation_fields['images'] = list()
    # loop through every image to save the link
    for image in range(img_number-1):
        img_href = driver.find_element_by_class_name('bh-photo-modal-image-element').find_element_by_tag_name('img').get_attribute('src')
        #print(img_href)
        accommodation_fields['images'].append(img_href)
        driver.find_element_by_class_name('bh-photo-modal-image-element').click()
    print("Total images are: " + str(img_number-1))

But I don't know to convert it to Goutte of php. Thank you. Regards.

enter image description here

1

There are 1 best solutions below

0
On

there is a class ".bh-photo-grid-item" and you can use it to get images of rooms. Please try below code and check if it works for your cases:

try {
    $crawler->filter('.bh-photo-grid-item')->each(function ($node) use ($row, &$images) {
        $images[] = $node->filter('a')->link()->getUri();
    });
} catch (InvalidArgumentException $e) {
    print_r($e);
}