How do I differentiate types of XML files before deserializing?

1.3k Views Asked by At

I am loading MusicXML-files into my program. The problem: There are two “dialects”, timewise and partwise, which have different root-nodes (and a different structure):

<?xml version="1.0" encoding='UTF-8' standalone='no' ?>
<!DOCTYPE score-partwise PUBLIC "-//Recordare//DTD MusicXML 2.0 Partwise//EN" "http://www.musicxml.org/dtds/partwise.dtd">
<score-partwise version="2.0">
    <work>...</work>
    ...
</score-partwise>

and

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE score-timewise PUBLIC "-//Recordare//DTD MusicXML 2.0 Timewise//EN" "http://www.musicxml.org/dtds/timewise.dtd">
<score-timewise version="2.0">
   <work>...</work>
   ...
</score-timewise>

My code for deserializing the partwise score so far is:

using (var fileStream = new FileStream(openFileDialog.FileName, FileMode.Open))
{
    var xmlSerializer = new XmlSerializer(typeof(ScorePartwise));
    var result = (ScorePartwise)xmlSerializer.Deserialize(fileStream);
}

What would be the best way to differentiate between the two dialects?

3

There are 3 best solutions below

2
On BEST ANSWER

Here's a way to do it by using an XDocument to parse the file, read the root element to determine the type, and read it into your serializer.

var xdoc = XDocument.Load(filePath);
Type type;
if (xdoc.Root.Name.LocalName == "score-partwise")
    type = typeof(ScorePartwise);
else if (xdoc.Root.Name.LocalName == "score-timewise")
    type = typeof(ScoreTimewise);
else
    throw new Exception();
var xmlSerializer = new XmlSerializer(type);
var result = xmlSerializer.Deserialize(xdoc.CreateReader());
0
On

I would create both serializers

var partwiseSerializer = new XmlSerializer(typeof(ScorePartwise));
var timewiseSerializer = new XmlSerializer(typeof(ScoreTimewise));

Assuming that there is only these two I would call CanDeserialize method on one

using (var fileStream = new FileStream(openFileDialog.FileName, FileMode.Open))
{
  using (var xmlReader = XmlReader.Create(filStream))
  {
    if (partwiseSerializer.CanDeserialize(xmlReader))
    {
       var result = partwiseSerializer.Deserialize(xmlReader);
    }
    else
    {
       var result = timewiseSerializer.Deserialize(xmlReader);
    }
  }
}

Obviously this is just an idea how to do it. If there were more options or according to your application design I would use a more sophisticated way to call CanDeserialize, but that method is the key in my opinion:

http://msdn.microsoft.com/en-us/library/system.xml.serialization.xmlserializer.candeserialize.aspx

The XmlReader class can be found here:

http://msdn.microsoft.com/en-us/library/System.Xml.XmlReader(v=vs.110).aspx

0
On

If you're concerned about resource usage:

    internal const string NodeStart = "<Error ";
    public static bool IsErrorDocument(string xml)
    {
        int headerLen = 1;
        if (xml.StartsWith(Constants.XMLHEADER_UTF8))
        {
            headerLen += Constants.XMLHEADER_UTF8.Length;
        }
        else if (xml.StartsWith(Constants.XMLHEADER_UTF16))
        {
            headerLen += Constants.XMLHEADER_UTF16.Length;
        }
        else
        {
            return false;
        }
        if (xml.Length < headerLen + NodeStart.Length)
        {
            return false;
        }
        return xml.Substring(headerLen, NodeStart.Length) == NodeStart;
    }

internal class Constants
{
    public const string XMLHEADER_UTF16 = "<?xml version=\"1.0\" encoding=\"utf-16\"?>";
    public const string XMLHEADER_UTF8 = "<?xml version=\"1.0\" encoding=\"utf-8\"?>";
}