I'm trying to parse an HTML document, and get text values from tags, but the problem is that the tags don't contain any special attributes or have some id's to target them. The only thing that can be anchored to - is another static text, used as Labels.
The source page code looks similar to this
<tr>
<td>
<span>
Some text to link to
</span>
</td>
<td>
<span>
THE text to get
</span>
</td>
</tr>
/*****************Parser Page Script*************************/
$file = "src/src.htm";
$doc = new DOMDocument();
$doc->loadHTMLFile($file);
/********* Page that Processes *********/
//Pattern for regEx
$pattern = "/Some text to link to/";
$elements = $doc->getElementsByTagName('td');
if (!is_null($elements)) {
foreach ($elements as $node){
$text = $node->textContent;
if(preg_match($pattern, $text, $matches)){
echo "<pre>";
print_r($node);
echo "</pre>";
}
}
}
How to get the nextSibling value for searched td if the result is [nextSibling] => (object value omitted)?
A possibility is to use Xpath. Example xpath: /table/tr/td/span