Unity C# System.Xml.Linq - XDocument changes the encoding in a XML declaration

469 Views Asked by At

I am attempting to modify a utf-16 encoded XML file in C# (specifically Unity 2017.4.33f1).

EDIT: Turns out the original file specified a utf-8 encoding!

I am loading the document using this code:

using (FileStream fileStream = new FileStream(inPath, FileMode.Open, FileAccess.Read))
{
   _Document = XDocument.Load(fileStream);
}

When inspecting the object from a debugger, the XDocument seems to have loaded the declaration of the document as UTF-8, even though the original document specifies UTF-16.

Debugger view of XDocument

Why is this happening? Is there any way to stop the XDocument from changing the encoding when loading a file?

1

There are 1 best solutions below

0
Yuzu On BEST ANSWER

tl;dr: Use XDocument.Save() and its overloads

Based on discussion within the comments of the question, this seems to be the behavior of Unity's 2017.4.33f1's .NET implementation:

XDocument.ToString() will encode the document to UTF-16 and output that XML as a string and change the in-document encoding declaration to utf-16, regardless of the encoding specified in the object/source file. .NET strings are always UTF-16 encoded, so this is the likely source of this behavior. .NET is outputting valid XML, but not XML that accurately reflects the XDocument object ToString() was called on. This means that code like:

XDocument doc = XDocument.Load(path); 
System.Encoding enc = System.Encoding.GetEncoding(doc.Declaration.Encoding);
System.IO.File.WriteAllText(path, doc.ToString(), enc);

will write invalid XML if the document was not originally UTF-16 encoded.

XDocument.Save(string path) respects the encoding specified in XDocument.Declaration and will save the file with that encoding.