I am trying to scrape a review score from Trustpilot. The HTML block looks like this
<div class="review-info__header" v-pre="">
<div class="review-info__header__verified">
<div class="star-rating star-rating-1 star-rating--medium">
<div class="star-item star-item--color">
<img src="https://cdn.trustpilot.net/brand-assets/1.3.0/single-star-transparent.svg" alt="Star 1">
</div>
<div class="star-item star-item--color">
<img src="https://cdn.trustpilot.net/brand-assets/1.3.0/single-star-transparent.svg" alt="Star 2">
</div>
<div class="star-item star-item--color">
<img src="https://cdn.trustpilot.net/brand-assets/1.3.0/single-star-transparent.svg" alt="Star 3">
</div>
<div class="star-item star-item--color">
<img src="https://cdn.trustpilot.net/brand-assets/1.3.0/single-star-transparent.svg" alt="Star 4">
</div>
<div class="star-item star-item--color">
<img src="https://cdn.trustpilot.net/brand-assets/1.3.0/single-star-transparent.svg" alt="Star 5">
</div>
</div>
</div>
</div>
I can tell the rating by the class star-rating-1
where the last part indicates the rating out of 5.
I am using DomCrawler. So I basically put the HTML into a variable. I am then trying
$rating = $review->filter('.review-info__header')->filter('.star-rating')->filterXPath('div[contains(@class, "star-rating-")]');
If I then output the HTML for this node, $rating->html()
I can see it is in the right location as it is outputting the inner HTML.
I have a couple of questions. Firstly, how can I extract the number from the class name, so I can determine the rating?
Secondly, if I remove the first and second filter, I get the current node list is empty. Is there a reason for this?
p.s. the first filter is for a parent div that I have not shown, but it exists.
Thanks