I'm using DomCrawler to get data from a Google Play page and it works in 99% of cases, except I stumbled upon a page where it can not find a specific div. I check the HTML code and it is definitely there. My code is
$autoloader = require __DIR__.'\vendor\autoload.php';
use Symfony\Component\DomCrawler\Crawler;
$app_id = 'com.balintinfotech.sinhalesekeyboardfree';
$response = file_get_contents('https://play.google.com/store/apps/details?id='.$app_id);
$crawler = new Crawler($response);
echo $crawler->filter('div[itemprop="datePublished"]')->text();
When I run that specific page I get
PHP Fatal error: Uncaught InvalidArgumentException: The current node list is empty.
However, if I use any other ID, I get the desired result. What exactly is about that page that breaks DomCrawler
As you correctly figured out, this doesn't happen in the English version, but it does in the Spanish one.
One difference I could spot was a comment by a user saying
නියමයි ඈ
. There seems to be something bothering the Crawler there. If you replace anull
characted (\x00
) by an empty string, it correctly gets what you're looking for:I'll try to look more into this.