Assigning tags by order to XML using ElementTree in Python

69 Views Asked by At

I have a XML file with multiple parents with nested children and I want to add a new "order" tag to indicate their orders.

At the moment I have something like

<root>
      <level id='FRUITS'>
        <level id='APPLE'>
          <heading>APPLE HEADER </heading>
          <level id='APPLE_ONE'>
            <file fileName='APPLE_ONE_I' />
            <heading>This is Apple One I.</heading>
            <file fileName='APPLE_ONE_II' />
            <heading>This is Apple One II.</heading>
          </level>
          <level id='APPLE_TWO'>
            <file fileName='APPLE_TWO_II' />
            <heading>This is Apple One I.</heading>
          </level>
        </level>
        <level id='ORANGE'>
          <heading>ORANGE HEADER</heading>
          <level id='ORANGE_ONE'>
            <file fileName='ORANGE_ONE_I' />
            <heading>This is Orange One I.</heading>
          </level>
        </level>
      </level>
    </root>

I need to add an indicator for the order of IDs but not the files. So that it looks like

<root>
      <level id='FRUITS' order='1'>
        <level id='APPLE' order='1'>
          <heading>APPLE HEADER</heading>
          <level id='APPLE_ONE' order='1'>
            <file fileName='APPLE_ONE_I' />
            <heading>This is Apple One I.</heading>
            <file fileName='APPLE_ONE_II' />
            <heading>This is Apple One II.</heading>
          </level>
          <level id='APPLE_TWO' order='2'>
            <file fileName='APPLE_TWO_II' />
            <heading>This is Apple One I.</heading>
          </level>
        </level>
        <level id='ORANGE' order='2'>
          <heading>ORANGE HEADER</heading>
          <level id='ORANGE_ONE' order='1'>
            <file fileName='ORANGE_ONE_I' />
            <heading>This is Orange One I.</heading>
          </level>
        </level>
      </level>
    </root>

1

There are 1 best solutions below

0
On BEST ANSWER

One way is to use xpath() in lxml to get a count of preceding-sibling levels to determine the value of "order"...

from lxml import etree

xml = """<root>
      <level id='FRUITS'>
        <level id='APPLE'>
          <heading>APPLE HEADER </heading>
          <level id='APPLE_ONE'>
            <file fileName='APPLE_ONE_I' />
            <heading>This is Apple One I.</heading>
            <file fileName='APPLE_ONE_II' />
            <heading>This is Apple One II.</heading>
          </level>
          <level id='APPLE_TWO'>
            <file fileName='APPLE_TWO_II' />
            <heading>This is Apple One I.</heading>
          </level>
        </level>
        <level id='ORANGE'>
          <heading>ORANGE HEADER</heading>
          <level id='ORANGE_ONE'>
            <file fileName='ORANGE_ONE_I' />
            <heading>This is Orange One I.</heading>
          </level>
        </level>
      </level>
    </root>
"""

tree = etree.fromstring(xml)

for level in tree.xpath(".//level"):
    level.set("order", str(len(level.xpath("preceding-sibling::level")) + 1))

print(etree.tostring(tree).decode())

printed output...

<root>
      <level id="FRUITS" order="1">
        <level id="APPLE" order="1">
          <heading>APPLE HEADER </heading>
          <level id="APPLE_ONE" order="1">
            <file fileName="APPLE_ONE_I"/>
            <heading>This is Apple One I.</heading>
            <file fileName="APPLE_ONE_II"/>
            <heading>This is Apple One II.</heading>
          </level>
          <level id="APPLE_TWO" order="2">
            <file fileName="APPLE_TWO_II"/>
            <heading>This is Apple One I.</heading>
          </level>
        </level>
        <level id="ORANGE" order="2">
          <heading>ORANGE HEADER</heading>
          <level id="ORANGE_ONE" order="1">
            <file fileName="ORANGE_ONE_I"/>
            <heading>This is Orange One I.</heading>
          </level>
        </level>
      </level>
    </root>