I'm trying to scrape a webpage using phpsimpledom.
$html = '<div class="namepageheader">
<div class="u">Name: <a href="someurl">Noor Shaad</a>
<div class="u">Age: </div>
</div> '
$name=$html->find('div[class="u"]', 0)->innertext;
$age=$html->find('div[class="u"]', 1)->innertext;
I tried my best to get text from each class="u"
but it didn't work because there is missing closing tag </div>
on first tag <div class="u">
. Can anyone help me out with that....
You can find an element close to where the tag should have been closed and then standardize the html by replacing it. For example, you can replace the
</a>
tag by</a></div>
.or if there are too many closed
</a>
tags , replace</a><div class="u">
with</a></div><div class="u">
There may be another problem. There is a gap between the tags and the replacement does not work properly. To solve this problem, you can first delete the spaces between the tags and then replace them.