Getting first level using TFHpple

233 Views Asked by At

I have some trouble using TFHpple, so here it is : I would like to parse the following lines :

<div class=\"head\" style=\"height: 69.89px; line-height: 69.89px;\">
    <div class=\"cell editable\" style=\"width: 135px;\"contenteditable=\"true\">
        <p>&nbsp;1</p>
    </div>
    <div class=\"cell\" style=\"width: 135px;\" contenteditable=\"false\">
        <p>2</p>
    </div>
</div>

<div style=\"height: 69.89px; line-height: 69.89px;\" class=\"head\">
    <div class=\"cell\" style=\"width: 135px; text-align: left;\"contenteditable=\"false\">
        <p>3&nbsp;</p>
    </div>
    <div class=\"cell\" style=\"width: 135px;\" contenteditable=\"false\">
        <p>4</p>
    </div>
</div>

<div style=\"height: 69.89px; line-height: 69.89px;\" class=\"\">
    <div class=\"cell\" style=\"width: 135px;\" contenteditable=\"false\">
        <p>5</p>
    </div>
    <div class=\"cell\" style=\"width: 135px;\" contenteditable=\"false\">
        <p>6</p>
    </div>
</div>

For now I would like to put the first level of div "element" (sorry I don't know the proper terminology) in an array. So I have tried to do it by simply giving /div as the xPath to the searchWithXPathQuery methods but it simply doesn't find anything.

My second solution was to try using a path of this kind : //div[@class=\"head\"] but also allowing [@class=\"\"] but I don't even know if it is possible. (I would like to do so because I need the elements to be in the same order in the array as they are in the data)

So here is my question, is there a particular reason why TFHpple wouldn't work with /div? And if there is noway to just take the first level of div, then is it possible to make a predicate on the value of an attribute with xPath (here the attribute class) ? (And how ? I have looked quite a lot now and couldn't find anything)

Thanks for your help.

PS : If it helps, here is the code I use to try and parse the data, it is first contained in the string self.material.Text :

NSData * data = [self.material.Text dataUsingEncoding:NSUnicodeStringEncoding];
TFHpple * tableParser = [TFHpple hppleWithHTMLData:data];
NSString * firstXPath = @"/div";
NSArray<TFHppleElement *> * tableHeader = [tableParser searchWithXPathQuery:firstXPath];
NSLog(@"We found : %d", tableHeader.count);
2

There are 2 best solutions below

0
Abel On BEST ANSWER

You wrote:

Getting first level using TFHpple

I assume you mean: without also getting all descendants?

Taking your other requirements into account, you can do so as follows:

//div[not(ancestor::div)][@class='head' or @class='']

Dissecting this:

  • Select all div elements (yes, correct term ;) at any level in the whole document: //div
  • Filter with a predicate (the thing between brackets) for elements not containing a div themselves, by checking if there's some div ancestor (parent of a parent of a parent of a....) [not(ancestor::div)]
  • Filter by your other requirements: [@class='head' or @class='']

Note 1: your given XML is not valid, it contains multiple root elements. XML can have at most one root element.

Note 2: if your requirements are to first get all divs by @class or empty @class, and then only those that are "first level", reverse the predicates:

//div[@class='head' or @class=''][not(ancestor::div)]
2
har07 On

You can use the following XPath expression to get div element -that's quite a correct term-, having class attribute value equals "head" or empty :

//div[@ciass='head' or @class='']