simple_html_dom scrape all lines with chracteristic and then output them below

71 Views Asked by At

I currently got this far in scraping with htmldom (as far as examples go)

<?php
require 'simple_html_dom.php';
$html = file_get_html('https://nitter.absturztau.be/chillartaholic');
$title = $html->find('title', 0);
$image = $html->find('img', 0);
echo $title->plaintext."<br>\n";
echo $image->src;
?>

However instead of retrieving a title and image, I'd like to instead get all lines in the target page that begin with:

<a class="tweet-link"

and display the lines scraped - in their entirety - top to bottom below.

(First scraped line would then be:

> <a class="tweet-link"
> href="/ChillArtaholic/status/1413973360841744390#m"></a>

Is this possible with htmldom (or are there limitations on the scrapeable number of lines et all?)

1

There are 1 best solutions below

0
repeekyraid cero On BEST ANSWER

Strangely enough, the answer from yesterday is gone.

This was the consensus that works (altho their answer had many different other approaches) :/

<?php
$dom = new DOMDocument;
@$dom->loadHTML($html);
$links = $dom->getElementsByTagName('a');
$url = 'https://nitter.absturztau.be/chillartaholic';
$html = file_get_contents($url);
$dom = new DOMDocument();
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//a[@class="tweet-link"]');

foreach ($nodes as $node){
    echo $link->nodeValue;
    echo $node-> getAttribute('href'), '<br>';
}
?>