How to parse XML with Python xmlschema and preserve order of elements

166 Views Asked by At

I have to parse XML files that are in the root element a xs:choise. Some of the element types are hex encoded with little endian order, others are big endian order. Using a schema I can define different types for these.

I am trying to use the xmlschema package, where I can use the value_hook method to alter element value parsing. The value_hook is a callback that receives the value from the XML file, as well as the XSD type. This allows me to convert the hex values into int with the correct endianness.

The problem I have is that the xmlschema decode function returns the parsed XML file where dictionaries are used to represent the structure. This does not preserve the order of the elements in the root element.

My example XML file:

<?xml version="1.0" encoding="UTF-8"?>

<A>
    <B>b1</B>
    <C>c</C>
    <B>b2</B>
</A>

is parsed into this: {'B': ['b1', 'b2'], 'C': ['c']} the sequence of B,C,B is lost.

I need something similar to elementTree, where using the get_children() I can iterate on all children, combined with the value_hook feature of xmlschema, or similar access to the type defined in the XSD file.

Thanks for any info.

Python code:

from xmlschema import XMLSchema

def parsing_value_hook(value, xsd):
    print(f'parsing hook: {value}, {xsd.name}')
    return value

test_prefix = 'test'
xml_file = test_prefix + '.xml'
xsd_file = test_prefix + '.xsd'

xml_schema = XMLSchema(xsd_file)
parsed= xml_schema.decode(xml_file, value_hook=parsing_value_hook)

print(parsed)
1

There are 1 best solutions below

0
On

For preserving order when there are children with tag repetitions you have to use the JsonMLConverter converter, e.g.:

>>> import xmlschema
>>> xmlschema.to_dict('collection.xml', converter=xmlschema.JsonMLConverter)

JsonML stands for JSON Markup Language, which is a convention created specifically for this purpose.