Inaccurate xml parsing due to line breaks in the content

373 Views Asked by At

I am trying to parse XML using NSXMLParser but since some of the xml that I have contains line breaks () I am getting inaccurate parsing. For example "A genuine leader is not a<br> searcher for consensus<br>but, a molder of consensus" gets parsed as "a molder of consensus" Not only does the
tags mess it up but also the comma after but seems to mess it up, guessing its because of directly next to the br tag without a space. Anyone have experience on how to fix this? A lot of people hon Stackoverflow seem to have the same issue but I haven't been able to find a solution for iOS.

In the xml the br tags prints out like this:

&lt;br&gt;

This is the xml I am parsing

<entry>
<title>Quote</title>
<content>A genuine leader is not a&lt;br&gt;
searcher for consensus&lt;br&gt;
but, a molder of consensus</content>
</entry>

This is my xml parsing code

- (void) parser:(NSXMLParser *)parser didStartElement:(NSString *)elementname namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict
{   
    if ([elementname isEqualToString:@"entry"])
    {
        currentQuote = [[SQuote alloc] init];
    }

}

- (void) parser:(NSXMLParser *)parser didEndElement:(NSString *)elementname namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName
{

    if ([elementname isEqualToString:@"content"]){
        currentQuote.content = currentNodeContent;
    }

    if ([elementname isEqualToString:@"entry"])
    {
        [self.popularEntries addObject:currentQuote];
        currentQuote = nil;
        currentNodeContent = nil;
    }
}

EDIT:

I tried changing my charactersFound code to the following :

- (void) parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{


    if (currentNodeContent == nil)
        currentNodeContent = [[NSMutableString alloc] initWithCapacity: 20];

    [currentNodeContent appendString: [string stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]]];

}

But I am still getting an error that says "Attempt to mutate immutable object with appendString:" even though currentNodeContent is of type NSMutableString

1

There are 1 best solutions below

0
On

Haven't looked at your code in detail, but you should be aware that SAX parsers do not promise that all contiguous code will be delivered as a single characters() call. Entity references, like your <br>, are a classic case where many/most parsers will deliver text before them as one characters() call, the entity's expansion as another, and text following them as a third.

It is your application's responsibility to accumulate data from successive characters() calls until a non-characters() event comes in.

(There are reasons for this having to do with efficiency of SAX event delivery and parser buffer management and so on, but unless you're writing a parser all you need to know is that previous sentence.)

Any good SAX tutorial should illustrate ways to do this.

(Similar issues can arise with the DOM, if the parser has been told to retain entity boundaries or if the document has been edited since first being parsed. Applications should be prepared to find several Text nodes in succession as siblings, unless the DOM is known to be in normalized form.)