How to determine which of several different errors might have caused XmlException?

168 Views Asked by At

The system I'm working on uses DataSet.ReadXml(XmlReader) to read an XML file and load its contents to a DataSet. The XML file is from a business partner and may not always be well-formed, but this system is expected to perform reasonable corrections to the input.

We've seen errors in the XML input files, such as:

  • Case 1: In the middle of a string value, use of characters such as '<', '>', or my favorite, '&', which causes "An error occurred while parsing EntityName. Line x, position y."
  • Case 2: In the middle of a string value, weird constructs such as "<3" so that the text depicts a heart, which causes "Name cannot begin with the '3' character. Line x, position y."
  • Case 3: Invalid characters for the given encoding, which causes "Invalid character in the given encoding. Line x, position y."

If some simple rules are adopted, these errors can be addressed programmatically:

  • Case 1: Replace the offending character with its XML character entity ("&" becomes "&amp;", etc.
  • Case 2: Replace the "<" in "<3" with a space, so that it becomes " 3"
  • Case 3: Replace the invalid character with a space

However, all of these errors raise the same exception: System.Xml.XmlException

I would like to take an appropriate action when any of these errors are encountered, but what's the best way to do that? These three different errors all have the same HRESULT value (-2146232000), and so far the only way I have been able to differentiate amongst them is by inspection of the XmlException.Message string property.

String comparison seems a lousy way to determine the exact cause of the error. Were I to follow that approach, the code would break should the exception message change in future versions of .NET. It would also not be portable to some languages.

Therefore, how does one programmatically differentiate between the various types of errors that could be represented in an XmlException?

EDIT

In the comments below I've received advice about the importance of ensuring that XML data is of high quality. I don't disagree, but as my question states, it's outside my control and I can do nothing about it. So, as well-intentioned as your remarks are, they miss the point. If you know a good way to differentiate amongst the very many errors that can be presented by the System.Xml.XmlException class, please, share your knowledge. Thank you.

1

There are 1 best solutions below

0
Michael Kay On

Rather than trying to parse non-XML with an XML parser and catching the errors, if you really want to process non-XML then I would try preprocessing it with a parser for the particular non-XML grammar that you want to accept. Before you ever submit the data to an XML parser, run it through a Perl script or similar that recognizes the patterns that you want to convert to XML, then run the result through an XML parser.