Find and remove sub-element in XML file

69 Views Asked by At

I’m new to Python so here is my problem:

XML:

<Configuration>
   <ConfiguredPaths>
       <ConfiguredPath>
           <LocalPath>C:\Temp</LocalPath>
           <EffectivePath>\\SERVERNAME\C$\Temp</EffectivePath>
       </ConfiguredPath>
       <ConfiguredPath>
           <LocalPath>C:\Files</LocalPath>
           <EffectivePath>\\SERVERNAME\C$\Files</EffectivePath>
       </ConfiguredPath>
       <ConfiguredPath>
           <LocalPath>C:\DOCS</LocalPath>
           <EffectivePath>\\SERVERNAME\C$\DOCS</EffectivePath>
       </ConfiguredPath>
   </ConfiguredPaths>
</Configuration>

What I need to be able to do is locate the element "EffectivePath" if it equals a certain value then delete the whole section it belongs to. Since is a child of "ConfiguredPath" (the section that needs to be deleted related to onlt that particular effective path)

Here is an example result if EffectivePath = "\SERVERNAME\C$\DOCS"

=> Result XML file should be as folllows:

<Configuration>
   <ConfiguredPaths>
       <ConfiguredPath>
           <LocalPath>C:\Temp</LocalPath>
           <EffectivePath>\\SERVERNAME\C$\Temp</EffectivePath>
       </ConfiguredPath>
       <ConfiguredPath>
           <LocalPath>C:\Files</LocalPath>
           <EffectivePath>\\SERVERNAME\C$\Files</EffectivePath>
       </ConfiguredPath>
   </ConfiguredPaths>
</Configuration>

Here is my script; however it removes all the ConfiguredPaths (and hence its children) rather that just the required one:

import xml.etree.ElementTree as ET
tree = ET.parse('Data.xml')
root = tree.getroot()

for child in root:
    if child.tag == "ConfiguredPaths":
        for elem in child.iter():
            if elem.tag == "ConfiguredPath":
                for child_elem in child.iter():
                    if child_elem.tag == "EffectivePath" and child_elem.text == r"\\SERVERNAME\C$\DOCS":
                        print(f"Required item is:", child_elem.tag, child_elem.text)
                        root.remove(child)

tree.write('output.xml') 
4

There are 4 best solutions below

0
Daniel Haley On BEST ANSWER

One more example using lxml instead of ElementTree (because ElementTree has limited support for xpath and also lxml has the convenient .getparent() method).

from lxml import etree

to_remove = r"\\SERVERNAME\C$\DOCS"

tree = etree.parse("Data.xml")

# The context for tree is already /Configuration, so using a relative xpath.
for elem in tree.xpath(f"./ConfiguredPaths/ConfiguredPath[EffectivePath='{to_remove}']"):
    elem.getparent().remove(elem)

tree.write("output.xml")
0
Yitzhak Khabinsky On

Please try the following solution based on XSLT.

It is using a so called Identity Transform pattern.

Input XML

<Configuration>
    <ConfiguredPaths>
        <ConfiguredPath>
            <LocalPath>C:\Temp</LocalPath>
            <EffectivePath>\\SERVERNAME\C$\Temp</EffectivePath>
        </ConfiguredPath>
        <ConfiguredPath>
            <LocalPath>C:\Files</LocalPath>
            <EffectivePath>\\SERVERNAME\C$\Files</EffectivePath>
        </ConfiguredPath>
        <ConfiguredPath>
            <LocalPath>C:\DOCS</LocalPath>
            <EffectivePath>\\SERVERNAME\C$\DOCS</EffectivePath>
        </ConfiguredPath>
    </ConfiguredPaths>
</Configuration>

XSLT

<?xml version='1.0'?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes"
               encoding="UTF-8" indent="yes"/>
   <xsl:strip-space elements="*"/>

   <!--Identity transform-->
   <xsl:template match="@*|node()">
      <xsl:copy>
         <xsl:apply-templates select="@*|node()"/>
      </xsl:copy>
   </xsl:template>

   <xsl:template match="ConfiguredPath[EffectivePath = '\\SERVERNAME\C$\DOCS']"/>
</xsl:stylesheet>

Python

import lxml.etree as lx

# PARSE XML AND XSLT
doc = lx.parse("Input.xml")
style = lx.parse("Style.xslt")
outfile = "Output.xml"

# CONFIGURE AND RUN TRANSFORMER
transformer = lx.XSLT(style)
result = transformer(doc)

# OUTPUT TO FILE
with open(outfile, "wb") as f:
    f.write(result)

Output XML

<Configuration>
  <ConfiguredPaths>
    <ConfiguredPath>
      <LocalPath>C:\Temp</LocalPath>
      <EffectivePath>\\SERVERNAME\C$\Temp</EffectivePath>
    </ConfiguredPath>
    <ConfiguredPath>
      <LocalPath>C:\Files</LocalPath>
      <EffectivePath>\\SERVERNAME\C$\Files</EffectivePath>
    </ConfiguredPath>
  </ConfiguredPaths>
</Configuration>
0
balderman On

Just use core python xml lib

import xml.etree.ElementTree as ET

xml = '''<Configuration>
    <ConfiguredPaths>
        <ConfiguredPath>
            <LocalPath>C:\Temp</LocalPath>
            <EffectivePath>\\SERVERNAME\C$\Temp</EffectivePath>
        </ConfiguredPath>
        <ConfiguredPath>
            <LocalPath>C:\Files</LocalPath>
            <EffectivePath>\\SERVERNAME\C$\Files</EffectivePath>
        </ConfiguredPath>
        <ConfiguredPath>
            <LocalPath>C:\DOCS</LocalPath>
            <EffectivePath>\\SERVERNAME\C$\DOCS</EffectivePath>
        </ConfiguredPath>
    </ConfiguredPaths>
</Configuration>'''

root = ET.fromstring(xml)
cp_list = root.findall('.//ConfiguredPath')
for idx, cp in enumerate(cp_list):
    if cp.find('EffectivePath').text == '\\SERVERNAME\C$\DOCS':
        cp_list.remove(cp)

cp_root = root.find('.//ConfiguredPaths')
cp_root.clear()
for entry in cp_list:
    cp_root.append(entry)
ET.dump(root)

output

<Configuration>
    <ConfiguredPaths>
        <ConfiguredPath>
            <LocalPath>C:\Temp</LocalPath>
            <EffectivePath>\SERVERNAME\C$\Temp</EffectivePath>
        </ConfiguredPath>
        <ConfiguredPath>
            <LocalPath>C:\Files</LocalPath>
            <EffectivePath>\SERVERNAME\C$\Files</EffectivePath>
        </ConfiguredPath>
    </ConfiguredPaths>
</Configuration>
0
Hermann12 On

With xml.etree.ElementTree you can do:

import xml.etree.ElementTree as ET
tree = ET.parse('config.xml')
root = tree.getroot()

for child in root.findall(".//ConfiguredPath"):
    if child.find('EffectivePath').text == r"\\SERVERNAME\C$\DOCS":
        parent = root.findall(".//ConfiguredPath/..")
        for part in parent:
            part.remove(child)

ET.indent(root, space="  ")
ET.ElementTree(root).write('out.xml', xml_declaration=True)           
ET.dump(root)

Output:

<Configuration>
  <ConfiguredPaths>
    <ConfiguredPath>
      <LocalPath>C:\Temp</LocalPath>
      <EffectivePath>\\SERVERNAME\C$\Temp</EffectivePath>
    </ConfiguredPath>
    <ConfiguredPath>
      <LocalPath>C:\Files</LocalPath>
      <EffectivePath>\\SERVERNAME\C$\Files</EffectivePath>
    </ConfiguredPath>
  </ConfiguredPaths>
</Configuration>