Symfony + DomCrawler - how to extract data attributes from a <div>

Question

Symfony + DomCrawler - how to extract data attributes from a <div>

2.6k Views Asked by LarryN At 17 August 2025 at 21:56

I'm using Symfony 2.8 & DomCrawler to parse a web site and I'm having a problem reading data attributes from a HTML entity. It might be as simple as a specific convention for data attributes, but I've not been able to find any references or examples on the web that discuss how to retrieve data attributes via DomCrawler.

Here are the details:

I have encountered an instance of this construct in the HTML I am parsing (from another web site, so I can't modify this HTML):

  <div class='slideshowclass' id='slideshow'>           
    <div data-thumb='http://www.example.com/thumbs/1.jpg'
        data-src='http://www.example.com/thumbs/1.jpg'></div>
    <div data-thumb='http://www.example.com/thumbs/2.jpg'
        data-src='http://www.example.com/thumbs/2.jpg'></div>
    <div data-thumb='http://www.example.com/thumbs/3.jpg'
        data-src='http://www.example.com/thumbs/3.jpg'></div>
    <div data-thumb='http://www.example.com/thumbs/4.jpg'
        data-src='http://www.example.com/thumbs/4.jpg'></div>
    <div data-thumb='http://www.example.com/thumbs/5.jpg'
        data-src='http://www.example.com/thumbs/5.jpg'></div>
    <div data-thumb='http://www.example.com/thumbs/6.jpg'
        data-src='http://www.example.com/6.jpg'></div>
  </div>

I'm using this code to search the block of div's and return the data-src values:

function getList( Crawler $pWebDoc ) {
    $list = $pWebDoc->filter( 'div#slideshow');
    if ( !$list )
        return null;

    $retlist = null;
    $x = $list->count();
    if ( $x > 0 ) {
        /* @var $item Crawler */
        $retlist = $list->children()->each( function (Crawler $item, $i ) {
            return ( "$i:" . $item->attr( 'data-src' ));
        });
    }

    return ( $retlist );
}

From the DomCrawler docs I expect the attr function to return the data-src attribute value, but it returns null; the return from my function being an array of 6 elements with just the number and not additional text.

Thanks in advance for your help.

Original Q&A

There are 1 best solutions below

**Shaun Bramley** · Answer 1

This can be easily done using the DOMDocument and XPath libraries. XPath does provide the capability of returning array's of values instead of nodes.

/**
 * Filters the list of nodes with an XPath expression.
 *
 * The XPath expression should already be processed to apply it in the context of each node.
 *
 * @param string $xpath
 *
 * @return Crawler
 */
private function filterRelativeXPath($xpath)
{
    $prefixes = $this->findNamespacePrefixes($xpath);
    $crawler = $this->createSubCrawler(null);
    foreach ($this->nodes as $node) {
        $domxpath = $this->createDOMXPath($node->ownerDocument, $prefixes);
        $crawler->add($domxpath->query($xpath, $node));
    }
    return $crawler;
}

This function is from Crawler.php. My experience has been that the Crawler wasn't happy with complex xpath expressions, which resulted in switching from the DomCrawler to using xpath / dom directly.

Your base xpath query would be like //div/@data-src

Symfony + DomCrawler - how to extract data attributes from a <div>

There are 1 best solutions below

Related Questions in SYMFONY

Related Questions in DOMCRAWLER

Trending Questions

Popular # Hahtags

Popular Questions