I am fetching html from a website with file_get_contents
. I have a table (with a class name) inside html, and I want to get the data inside html tags.
This is how I fetch the html data from url:
$url = 'http://example.com';
$content = file_get_contents($url);
The html looks like:
<table class="space">
<thead></thead>
<tbody>
<tr>
<td class="marsia">1</td>
<td class="mars">
<div>Mars</div>
</td>
</tr>
<tr>
<td class="earthia">2</td>
<td class="earth">
<div>Earth</div>
</td>
</tr>
</body>
</table>
Is there a way to searh DOM elements in php like we do in jQuery? So that I can access the values 1, 2 (first td) and div's value inside second td.
Something like
a) search the html for table with class name space
b) inside that table, inside tbody, return each tr's 'first td's value' and 'div's value inside second td'
So I get; 1 and Mars, 2 and Earth.
Use the DOM extension, for example. Its
DOMXPath
class is particularly useful for such kind of tasks.You can easily set the listed conditions with an XPath expression like this:
where -
//table[@class="space"]
selects alltable
elements from the document havingclass
attribute value equal to"space"
string; -//tr[count(td) = 2]
selects alltr
elements having exactly twotd
child elements; -/td
represents thetd
elements.Sample implementation:
Output
The code above is supposed to be considered as a sample rather than an instruction for practical use, as it is not very scalable. The logic is bound to the fact that the XPath expression selects exactly two cells for each row. In practice, you may want to select the rows, iterate them, and put the extra conditions into the loop, e.g.:
DOMXPath::query()
is called with an XPath expression relative to the current row ($tr
), then checks if the returnedDOMNodeList
contains at least two cells. The rest of the code is trivial.You can also use
SimpleXML
extension, which also supports XPath. But the extension is much less flexible as compared to theDOM
extension.For huge documents, use extensions based on SAX-based parsers such as
XMLReader
.