Dealing with Invalid XML characters using XElement

1.4k Views Asked by At

I have a C# API that returns an XElement object. This XElement object is constructed via code that looks like -

string invalidXML = "a \v\f\0";    
XElement fe = new XElement("Data", invalidXML);
Console.WriteLine(fe);

By observation, I know that when trying to pass an invalid XML character to the XElement constructor above, a System.Argument exception is thrown.

So as it turns out, XElement does not throw an error when a string with InvalidXML characters is passed through. If you try to print the XElement via say Console.WriteLine(fe), then you get an exception from the XMLWriter-

System.ArgumentException: '', hexadecimal value 0x0B, is an invalid character.
   at System.Xml.XmlEncodedRawTextWriter.InvalidXmlChar(Int32 ch, Char* pDst, Boolean entitize)
   at System.Xml.XmlEncodedRawTextWriter.WriteElementTextBlock(Char* pSrc, Char* pSrcEnd)
   at System.Xml.XmlEncodedRawTextWriter.WriteString(String text)
   at System.Xml.XmlEncodedRawTextWriterIndent.WriteString(String text)
   at System.Xml.XmlWellFormedWriter.WriteString(String text)
   at System.Xml.Linq.ElementWriter.WriteElement(XElement e)
   at System.Xml.Linq.XElement.WriteTo(XmlWriter writer)
   at System.Xml.Linq.XNode.GetXmlString(SaveOptions o)
   at System.Xml.Linq.XNode.ToString()
   at System.IO.TextWriter.WriteLine(Object value)
   at System.IO.TextWriter.SyncTextWriter.WriteLine(Object value)
   at System.Console.WriteLine(Object value)
   at TestLoggingForUNIT.Program.Main(String[] args) in C:\Users\shivanshu\source\repos\TestLoggingForUNIT\TestLoggingForUNIT\Program.cs:line 29

To me it seems like XElement itself does not do any validation. It's when it's printed/serialized, in .NET, internally the XML writer is called and that's when an exception is thrown.

My question is, that does XElement, always guarantee that an exception will be thrown if an invalid XML character is passed.

In other words, do I need to check the string that I am passing for Invalid XML characters? Using something like XmlConvert.IsXmlChar(string)?

I looked at the link below but could not find a satisfactory answer to my question-

https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/valid-content-of-xelement-and-xdocument-objects3

1

There are 1 best solutions below

0
On

It is the XmlWriter that is verifying that valid characters are being written. In the official documentation, the relevant XmlWriter configuration is described in the Data Conformance section:

Data conformance

An XML writer uses two properties from the XmlWriterSettings class to check for data conformance:

The CheckCharacters property instructs the XML writer to check characters and throw an XmlException exception if any characters are outside the legal range, as defined by the W3C.

The ConformanceLevel property configures the XML writer to check that the stream being written complies with the rules for a well-formed XML 1.0 document or document fragment, as defined by the W3C. The three conformance levels are described in the following table. The default is Document. For details, see the XmlWriterSettings.ConformanceLevel property and the System.Xml.ConformanceLevel enumeration.

Yes, with the CheckCharacters flag set to true, it will guarantee an exception is thrown when it enounters an illegal character.

If you want to allow writing invalid characters, the CheckCharacters flag can be set to false in the XmlWriterSettings for your XmlWriter, which would prevent the exception from being thrown. Normally, the XmlWriter will encode reserved characters as character entities (e.g. < to &lt;). Additionally, with the flag set to false, the XmlWriter will escape illegal characters as numeric character entities (e.g. \f to &#xC;) to produce text that conforms to the XML specification.