Extracting class name using DomCrawler

1k Views Asked by At

I am trying to scrape a review score from Trustpilot. The HTML block looks like this

<div class="review-info__header" v-pre="">
    <div class="review-info__header__verified">
        <div class="star-rating star-rating-1 star-rating--medium">
            <div class="star-item star-item--color">
                <img src="https://cdn.trustpilot.net/brand-assets/1.3.0/single-star-transparent.svg" alt="Star 1">
            </div>
            <div class="star-item star-item--color">
                <img src="https://cdn.trustpilot.net/brand-assets/1.3.0/single-star-transparent.svg" alt="Star 2">
            </div>
            <div class="star-item star-item--color">
                <img src="https://cdn.trustpilot.net/brand-assets/1.3.0/single-star-transparent.svg" alt="Star 3">
            </div>
            <div class="star-item star-item--color">
                <img src="https://cdn.trustpilot.net/brand-assets/1.3.0/single-star-transparent.svg" alt="Star 4">
            </div>
            <div class="star-item star-item--color">
                <img src="https://cdn.trustpilot.net/brand-assets/1.3.0/single-star-transparent.svg" alt="Star 5">
            </div>
        </div>
    </div>
</div>

I can tell the rating by the class star-rating-1 where the last part indicates the rating out of 5.

I am using DomCrawler. So I basically put the HTML into a variable. I am then trying

$rating = $review->filter('.review-info__header')->filter('.star-rating')->filterXPath('div[contains(@class, "star-rating-")]');

If I then output the HTML for this node, $rating->html() I can see it is in the right location as it is outputting the inner HTML.

I have a couple of questions. Firstly, how can I extract the number from the class name, so I can determine the rating?

Secondly, if I remove the first and second filter, I get the current node list is empty. Is there a reason for this?

p.s. the first filter is for a parent div that I have not shown, but it exists.

Thanks

0

There are 0 best solutions below