I want to be able to write fragments of indented XML with no namespaces, no XML preambles, and \n for line endings, using XmlSerializer for each fragment and using a single XmlWriter instance for all the fragments. How can I do that?
XmlSerializer.Serialize() produces indented output when serializing to a generic output Stream, but it uses "\n\r" for line endings and I can't find how to configure that.
I can serialize to an XmlWriter, which can be configured in detail, but the config only seems to work when you output complete a single document rather than multiple document fragments because XmlSerializer will throw an exception with ConformanceLevel.Fragment:
WriteStartDocument cannot be called on writers created with ConformanceLevel.Fragment
And, if as a workaround I call XmlWriter.WriteWhitespace("");, indentation gets disabled. Specifically, If I create an XmlWriter like this:
XmlWriter xw = XmlWriter.Create(System.Console.Out, new XmlWriterSettings()
{
ConformanceLevel = ConformanceLevel.Fragment,
NamespaceHandling = NamespaceHandling.Default,
NewLineChars = "\n",
Encoding = new UTF8Encoding(encoderShouldEmitUTF8Identifier: false), // supress BOM
Indent = true,
NewLineHandling = NewLineHandling.Replace,
OmitXmlDeclaration = true,
WriteEndDocumentOnClose = false,
CloseOutput = false,
CheckCharacters = false,
NewLineOnAttributes = false,
});
MVE
https://dotnetfiddle.net/qGLIlL
It does allow me to serialize multiple objects without creating a new XmlWriter for each one. But there's no indentation.
<flyingMonkey name="Koko"><limbs><limb name="leg" /><limb name="arm" /><limb name="tail" /><limb name="wing" /></limbs></flyingMonkey>
<flyingMonkey name="New Name"><limbs><limb name="leg" /><limb name="arm" /><limb name="tail" /><limb name="wing" /></limbs></flyingMonkey>
public class Program {
public static void Main()
{
XmlWriter xw = XmlWriter.Create(System.Console.Out, new XmlWriterSettings()
{
ConformanceLevel = ConformanceLevel.Fragment,
NamespaceHandling = NamespaceHandling.Default,
NewLineChars = "\n",
Encoding = new UTF8Encoding(encoderShouldEmitUTF8Identifier: false), // supress BOM
Indent = true,
NewLineHandling = NewLineHandling.Replace,
OmitXmlDeclaration = true,
WriteEndDocumentOnClose = false,
CloseOutput = false,
CheckCharacters = false,
NewLineOnAttributes = false,
});
var noNamespace = new XmlSerializerNamespaces(new[] { XmlQualifiedName.Empty });
// without this line I get an error:
// "WriteStartDocument cannot be called on writers created with ConformanceLevel.Fragment."
xw.WriteWhitespace("");
FlyingMonkey monkey = FlyingMonkey.Create();
XmlSerializer ser = new XmlSerializer(typeof(FlyingMonkey), defaultNamespace: null);
ser.Serialize(xw, monkey, noNamespace);
xw.WriteWhitespace("\n\n");
monkey.name = "New Name";
ser.Serialize(xw, monkey, noNamespace);
}
}
[System.Xml.Serialization.XmlTypeAttribute(TypeName = "flyingMonkey", Namespace=null)]
public class FlyingMonkey
{
[System.Xml.Serialization.XmlAttributeAttribute()]
public string name;
public Limb[] limbs;
public static FlyingMonkey Create() =>
new FlyingMonkey()
{
name = "Koko",
limbs = new Limb[]
{
new Limb() { name = "leg" }, new Limb() { name = "arm" },
new Limb() { name = "tail" }, new Limb() { name = "wing" },
}
};
}
[System.Xml.Serialization.XmlTypeAttribute(TypeName = "limb", Namespace=null)]
public class Limb
{
[System.Xml.Serialization.XmlAttributeAttribute()]
public string name;
}
What kinda works:
XmlWriter xw = XmlWriter.Create(System.Console.Out, new XmlWriterSettings()
{
ConformanceLevel = ConformanceLevel.Auto,
NamespaceHandling = NamespaceHandling.Default,
NewLineChars = "\n",
Encoding = new UTF8Encoding(encoderShouldEmitUTF8Identifier: false), // supress BOM
Indent = true,
NewLineHandling = NewLineHandling.Replace,
OmitXmlDeclaration = true,
WriteEndDocumentOnClose = false,
CloseOutput = false,
CheckCharacters = false,
NewLineOnAttributes = false,
});
var noNamespace = new XmlSerializerNamespaces(new[] { XmlQualifiedName.Empty });
// This is not needed anymore. If I invoke that, it will kill indentation for some reason.
// xw.WriteWhitespace("");
FlyingMonkey monkey = FlyingMonkey.Create();
XmlSerializer ser = new XmlSerializer(typeof(FlyingMonkey), defaultNamespace: null);
ser.Serialize(xw, monkey, noNamespace);
// xw.WriteWhitespace("\n\n");
// monkey.name = "New Name";
// ser.Serialize(xw, monkey, noNamespace); // this second serialization throws InvalidOperationException
It does print with right line endings, but won't let you write another object to the same XmlWriter instance.
<flyingMonkey name="Koko">
<limbs>
<limb name="leg" />
<limb name="arm" />
<limb name="tail" />
<limb name="wing" />
</limbs>
</flyingMonkey>
I want to reuse my single instance of XmlWriter because I need to be writing up to 100k elements, and creating an XmlWriter for each one adds a lot of overhead
Your basic problem is that
XmlSerializeris designed to serialize to a single well-formed XML document -- not to a sequence of fragments. If you attempt to serialize to XML withConformanceLevel.Fragment, you will get an exceptionIn this answer to .net XmlSerialize throws "WriteStartDocument cannot be called on writers created with ConformanceLevel.Fragment", Wim Reymen identifies two workarounds:
Calling
XmlWriter.WriteWhitespace("")before the first call to serialization.Unfortunately, as you have noticed, this disables indentation. The reason this happens is that
XmlWriterdisables indentation for mixed content and the call to write whitespace triggers the mixed content detection ofXmlEncodedRawTextWriterIndent(demo here).Calling
XmlWriter.WriteComment("")before the first serialization.While this does not disable indentation, is does of course write a comment that you don't want.
So what are your options for a workaround?
Firstly, as you noticed you could create a separate
XmlWriterfor each item withCloseOutput = false. In comments you wrote doing so adds a lot of overhead, I need to be writing up to 100k elements, so was hoping to reuse the writer instance, but I recommend you profile to make sure because this workaround is very, very simple compared to the alternatives.Assuming you are writing to a
Stream, you could create an extension method like this:And use it e.g. as follows:
Demo fiddle #1 here.
Alternatively, if you really need to reuse the
XmlWriterfor performance reasons, you will need to callXmlWriter.WriteComment()to prevent the exception fromXmlSerializerand edit out the unwanted comments afterwards, e.g. via someTextWriterdecorator that removes them as they are being written on the fly.The following extension method seems to do this:
But honestly I doubt it's worth the trouble. Demo fiddle #2 here.
With either approach, the output looks like