hpple: is it possible to get text value like javascript textContent

316 Views Asked by At

Is it possible to get only all the text content of the child elements recursively in hpple. Any method in TFHppleElement class? such as the javascript

document.getElementById("testdiv").textContent
2

There are 2 best solutions below

1
SergStav On

I'm using this code to get all content of the news title

NSURL *newURL = [NSURL URLWithString:@"http://somesite"];
        NSData *newsData = [NSData dataWithContentsOfURL: newURL];

        TFHpple *newsParser = [TFHpple hppleWithHTMLData: newsData];

        NSString *newsXpathQueryString = @"//div[@class='item column-1']";
        NSArray *newsNodes = [newsParser searchWithXPathQuery: newsXpathQueryString];

        NSMutableArray *newNews = [[NSMutableArray alloc] initWithCapacity: 0];

        for (TFHppleElement *element in newsNodes)
        {
            News *news = [[News alloc] init];

            [newNews addObject: news];

            news.title = [[element content] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];

            news.photo_url = [element objectForKey:@"src"];

            _allNews = newNews;
            [self.tableView reloadData];
        }
    }

you can use

news.title = [[element firstChild]content] to get children elements content
0
karim On

I wanted something like this - a quick boiler plate code, it is not an elegant solution with static contents. Please let me know, how can this be improved :)

#pragma mark - Hpple XML parser

/* The documents contents lots of nested div, table, span, style etc. */
- (NSString *) extractDefinition
{
    NSString *html = [self.webView stringByEvaluatingJavaScriptFromString: @"document.getElementById('innerframe').innerHTML"];
    if ([Resources stringIsEmpty:html]) {
        return nil;
    }

    return [self extractSubDiv:html];
}

- (NSString *)extractSubDiv:(NSString *)html
{
    TFHpple *hppleParser = [TFHpple hppleWithHTMLData:[html dataUsingEncoding:NSUTF8StringEncoding]];

    NSString * xpathQuery;
    xpathQuery = @"//div[@id='columnboth']";
    NSArray * defNodes = [hppleParser searchWithXPathQuery:xpathQuery];
    NSString * text = nil;
    if ([defNodes count] > 0) {
        TFHppleElement * element = [defNodes objectAtIndex:0];
        text = [self parseContents:element];
    } else {
        xpathQuery = @"//div[@id='columnsingle']";
        defNodes = [hppleParser searchWithXPathQuery:xpathQuery];
        if ([defNodes count] > 0) {
            TFHppleElement * element = [defNodes objectAtIndex:0];
            text = [self parseContents:element];
        }
    }
    return text;
}

- (NSString *) parseContents:(TFHppleElement *)element {
    NSArray * innhold = [element searchWithXPathQuery:@"//div[contains(@class,'articlecontents')]"];
    return [self getTextFromArray:innhold];
}


static NSMutableString * contents;

- (NSString *) getTextFromArray:(NSArray *)hppleElments {
    NSMutableString * text = [[NSMutableString new] autorelease];
    contents = nil;
    contents = [[NSMutableString new] autorelease];
    for (TFHppleElement * e in hppleElments) {
        [text appendFormat:@"%@ ", [self getText:e]];
    }
    return text;
}

/* Here are more nested div and then span for text. */
- (NSString *) getText:(TFHppleElement *)element
{
    if ([element isTextNode]) {
        [contents appendFormat:@" %@", element.content];
    }

    for (TFHppleElement * e in element.children) {
        [self getText:e];
    }

    return contents;
}