XMLException when processing RSS

108 Views Asked by At

I've been trying to process RSS feeds using Argotic for my newsreader application. For most of them it works fine, but on some feed (like this) it breaks with the following:

Additional information: For security reasons DTD is prohibited in this XML document. To enable DTD processing set the DtdProcessing property on XmlReaderSettings to Parse and pass the settings into XmlReader.Create method.

The error was straightforward, I passed an XMLReaderSettings object with DtdProcessing enabled. But then the following appeared:

An unhandled exception of type 'System.Xml.XmlException' occurred in System.Xml.dll Additional information: The ';' character, hexadecimal value 0x3B, cannot be included in a name. Line 9, position 366.

The code I am using:

    XmlReaderSettings settings = new XmlReaderSettings();
    settings.IgnoreComments = true;
    settings.IgnoreWhitespace = true;
    settings.DtdProcessing = DtdProcessing.Parse;

    XmlReader reader = XmlReader.Create(this.url, settings);
    RssFeed feed = new RssFeed();
    feed.Load(reader);

What am I missing?

2

There are 2 best solutions below

0
On BEST ANSWER

It seems ignoring the DtdProcessing solved my problem.

settings.DtdProcessing = DtdProcessing.Ignore;
2
On

The exception is telling you that the RSS feed is illegal - specifically, that a name contains the ; character. The W3C specification appears to prohibit this:

Document authors are encouraged to use names which are meaningful words or combinations of words in natural languages, and to avoid symbolic or white space characters in names. Note that COLON, HYPHEN-MINUS, FULL STOP (period), LOW LINE (underscore), and MIDDLE DOT are explicitly permitted.

The ASCII symbols and punctuation marks, along with a fairly large group of Unicode symbol characters, are excluded from names

Since other RSS readers also complained the feed was likely invalid. At the time of writing however, the W3C validator shows it to be valid!

According to the MSDN documentation for XmlReaderSettings.ConformanceLevel, this issue will cause an exception whatever your ConformanceLevel, but you might find another setting in XmlReaderSettings which can turn the behaviour off (supply the settings to XmlReader.Create). Otherwise, if the feed can't be fixed, you'll have to perform some pre-processing on it.