Php DOMNode travelling

44 Views Asked by At

I'm trying to parse an HTML document, and get text values from tags, but the problem is that the tags don't contain any special attributes or have some id's to target them. The only thing that can be anchored to - is another static text, used as Labels.

The source page code looks similar to this

 <tr>
<td>
  <span> 
    Some text to link to
  </span>
 </td>
 <td>
  <span> 
    THE text to get
  </span>
 </td>
</tr>

/*****************Parser Page Script*************************/
$file = "src/src.htm";
$doc = new DOMDocument();
$doc->loadHTMLFile($file);

/********* Page that Processes *********/
//Pattern for regEx
$pattern = "/Some text to link to/";

$elements = $doc->getElementsByTagName('td');

if (!is_null($elements)) {
foreach ($elements as  $node){
  $text = $node->textContent;

 if(preg_match($pattern, $text, $matches)){
        echo "<pre>";
         print_r($node);
        echo "</pre>";

     }
    }
   }

How to get the nextSibling value for searched td if the result is [nextSibling] => (object value omitted)?

1

There are 1 best solutions below

0
On

A possibility is to use Xpath. Example xpath: /table/tr/td/span

$file = "src/src.htm";
$doc = new DOMDocument();
$doc->loadHTMLFile($file);


$xpath = new DOMXpath($doc);
$elements = $xpath->query('/table/tr/td/span');
if(!empty($elements))
{   
    foreach($elements as $element)
    {   
        echo $element->nodeValue;
    }
}