Most efficient way to convert one XML to a different XML file in python xmltodict, elementTree etc

1.8k Views Asked by At

Howdie do,

So I have the following two XML files.

File A:

<?xml version="1.0" encoding="UTF-8"?>
<GetShipmentUpdatesResult>
    <Shipments>
        <Shipment>
            <Container>
                <OrderNumber>5108046</OrderNumber>
                <ContainerNumber>5108046_1</ContainerNumber>
                <CustomerOrderNumber>abcq123</CustomerOrderNumber>
                <ShipDate>2015-07-12T12:00:00</ShipDate>
                <CarrierName>UPS</CarrierName>
                <TrackingNumber>1ZX20520A803682850</TrackingNumber>
                <StatusCode>InTransit</StatusCode>
                <Events>
                    <TrackingEvent>
                        <TimeStamp>2015-06-29T13:53:18</TimeStamp>
                        <City></City>
                        <StateOrProvince></StateOrProvince>
                        <Description>manifested from Warehouse</Description>
                        <TrackingStatus>Manifest</TrackingStatus>
                    </TrackingEvent>
                    <TrackingEvent>
                        <TimeStamp>2015-06-29T18:47:44</TimeStamp>
                        <City>Glenwillow</City>
                        <StateOrProvince>OH</StateOrProvince>
                        <Description>Status: AF Recorded</Description>
                        <TrackingStatus>In Transit</TrackingStatus>
                    </TrackingEvent>
                </Events>
            </Container>
        </Shipment>
        <Shipment>
            <Container>
                <OrderNumber>456789</OrderNumber>
                <ContainerNumber>44789</ContainerNumber>
                <CustomerOrderNumber>abcq123</CustomerOrderNumber>
                <ShipDate>2015-07-03T13:56:27</ShipDate>
                <CarrierName>UP2</CarrierName>
                <TrackingNumber>1Z4561230020</TrackingNumber>
                <StatusCode>IN_TRANSIT</StatusCode>
                <Events>
                    <TrackingEvent>
                        <TimeStamp>2015-07-03T13:56:27</TimeStamp>
                        <City>Glenwillow</City>
                        <StateOrProvince>OH</StateOrProvince>
                        <Description>manifested from Warehouse</Description>
                        <TrackingStatus>Manifest</TrackingStatus>
                    </TrackingEvent>
                </Events>
            </Container>
        </Shipment>
    </Shipments>
    <MatchingRecords>2</MatchingRecords>
    <RequestId></RequestId>
    <RecordsRemaining>0</RecordsRemaining>
</GetShipmentUpdatesResult>

File B:

<?xml version="1.0" encoding="UTF-8"?>
<getShipmentStatusResponse>
    <getShipmentStatusResult>
        <outcome>
            <result>Success</result>
            <error></error>
        </outcome>
        <shipments>
            <shipment>
                <orderID>123456</orderID>
                <containerNo>CD1863663C</containerNo>
                <shipDate>2015-06-29T18:47:44</shipDate>
                <carrier>UPS</carrier>
                <trackingNumber>1Z4561230001</trackingNumber>
                <statusCode>IN_TRANSIT</statusCode>
                <statusMessage>In Transit</statusMessage>
                <shipmentEvents>
                    <trackingUpdate>
                        <timeStamp>2015-06-29T13:53:18</timeStamp>
                        <city />
                        <state />
                        <trackingMessage>Manifest</trackingMessage>
                    </trackingUpdate>
                    <trackingUpdate>
                        <timeStamp>2015-06-29T18:47:44</timeStamp>
                        <city>Glenwillow</city>
                        <state>OH</state>
                        <trackingMessage>Shipped from warehouse</trackingMessage>
                    </trackingUpdate>
                </shipmentEvents>
            </shipment>
            <shipment>
                <orderID>456789</orderID>
                <containerNo>44789</containerNo>
                <shipDate>2015-07-03T13:56:27</shipDate>
                <carrier>UP2</carrier>
                <trackingNumber>1Z4561230020</trackingNumber>
                <statusCode>IN_TRANSIT</statusCode>
                <statusMessage>In Transit</statusMessage>
                <shipmentEvents>
                    <trackingUpdate>
                        <timeStamp>2015-07-03T13:56:27</timeStamp>
                        <city>Glenwillow</city>
                        <state>OH</state>
                        <trackingMessage>Manifest</trackingMessage>
                    </trackingUpdate>
                </shipmentEvents>
            </shipment>
        </shipments>
        <matchingRecords>2</matchingRecords>
        <requestId></requestId>
        <remainingRecords>0</remainingRecords>
    </getShipmentStatusResult>
</getShipmentStatusResponse>

I basically need to read through File A and change it to look like File B. Now, I've been using xmltodic to parse the File A, but it only will read the top element. It seems I would have to create multiple for loops in order to achieve this with xmltodict. A loop to go through each parent and then childern elements.

Looking at elementree, this appears to be the same. Does anyone know any other way to do this without having to do multiple for loops?

1

There are 1 best solutions below

4
On BEST ANSWER

Since your output is more or less an exact mapping of the input - only the element names seem to differ, I suggest you use XSLT to do the transformation declaratively.

Assuming that each input element name maps unconditionally to exactly one output element name (that's what it looks like, judging by your sample): Here's an XSLT 1.0 transformation to get you started (a basic instruction how to use XSLT in Python can be found in this answer):

<xsl:transform version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:my="http://tempuri.org/config"
  exclude-result-prefixes="my"
>
  <xsl:output method="xml" encoding="UTF-8" indent="yes" />
  <xsl:strip-space elements="*" />

  <my:config>
    <nameMap from="Shipments" to="shipments" />
    <nameMap from="Shipment" to="shipment" />
    <nameMap from="Container" to="-" />
  </my:config>
  <xsl:variable name="nameMap" select="document('')/*/my:config/nameMap" />

  <xsl:template match="node() | @*" name="identity">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()" />
    </xsl:copy>
  </xsl:template>

  <xsl:template match="/">
    <getShipmentStatusResponse>
      <xsl:apply-templates select="@* | node()" />
    </getShipmentStatusResponse>
  </xsl:template>

  <xsl:template match="GetShipmentUpdatesResult">
    <getShipmentStatusResult>
      <outcome>
        <result>Success</result>
        <error></error>
      </outcome>
      <xsl:apply-templates select="@* | node()" />
    </getShipmentStatusResult>
  </xsl:template>

  <xsl:template match="*">
    <xsl:variable name="map" select="$nameMap[@from = name(current())]" />
    <xsl:choose>
      <xsl:when test="$map/@to = '-'">
        <xsl:apply-templates select="@* | node()" />
      </xsl:when>
      <xsl:when test="$map/@to != ''">
        <xsl:element name="{$map/@to}">
          <xsl:apply-templates select="@* | node()" />
        </xsl:element>
      </xsl:when>
      <xsl:when test="$map/@to = ''" />
      <xsl:otherwise>
        <xsl:call-template name="identity" />
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
</xsl:transform>

The transformation approaches the problem as follows:

  • At its core, the identity transform is at work: Any node that does not match a specialized template will be copied to the output as-is.
  • It contains an in-place config section (<my:config>) where you can place <nameMap> elements for mapping input names to output names. This works through the following convention (implemented in the <xsl:template match="*"> a few lines down):

    • if an input element matches any @from and the @to is filled in, the element will renamed and its children will be processed
    • if an input element matches any @from and the @to is '-', the element will be removed but its children will still be processed.
    • if an input element matches any @from and the @to is empty, it will be removed from the output completely
    • in all other cases the input element will be copied 1:1, via the identity template.

Currently the output looks like this. Add more <nameMap> rules to define the behavior for the rest of the input elements.

<getShipmentStatusResponse>
  <getShipmentStatusResult>
    <outcome>
      <result>Success</result>
      <error />
    </outcome>
    <shipments>
      <shipment>
        <OrderNumber>5108046</OrderNumber>
        <ContainerNumber>5108046_1</ContainerNumber>
        <CustomerOrderNumber>abcq123</CustomerOrderNumber>
        <ShipDate>2015-07-12T12:00:00</ShipDate>
        <CarrierName>UPS</CarrierName>
        <TrackingNumber>1ZX20520A803682850</TrackingNumber>
        <StatusCode>InTransit</StatusCode>
        <Events>
          <TrackingEvent>
            <TimeStamp>2015-06-29T13:53:18</TimeStamp>
            <City />
            <StateOrProvince />
            <Description>manifested from Warehouse</Description>
            <TrackingStatus>Manifest</TrackingStatus>
          </TrackingEvent>
          <TrackingEvent>
            <TimeStamp>2015-06-29T18:47:44</TimeStamp>
            <City>Glenwillow</City>
            <StateOrProvince>OH</StateOrProvince>
            <Description>Status: AF Recorded</Description>
            <TrackingStatus>In Transit</TrackingStatus>
          </TrackingEvent>
        </Events>
      </shipment>
      <shipment>
        <OrderNumber>456789</OrderNumber>
        <ContainerNumber>44789</ContainerNumber>
        <CustomerOrderNumber>abcq123</CustomerOrderNumber>
        <ShipDate>2015-07-03T13:56:27</ShipDate>
        <CarrierName>UP2</CarrierName>
        <TrackingNumber>1Z4561230020</TrackingNumber>
        <StatusCode>IN_TRANSIT</StatusCode>
        <Events>
          <TrackingEvent>
            <TimeStamp>2015-07-03T13:56:27</TimeStamp>
            <City>Glenwillow</City>
            <StateOrProvince>OH</StateOrProvince>
            <Description>manifested from Warehouse</Description>
            <TrackingStatus>Manifest</TrackingStatus>
          </TrackingEvent>
        </Events>
      </shipment>
    </shipments>
    <MatchingRecords>2</MatchingRecords>
    <RequestId />
    <RecordsRemaining>0</RecordsRemaining>
  </getShipmentStatusResult>
</getShipmentStatusResponse>