Python minidom: How to get parent nodes of those first child nodes that have the same attribute values

2k Views Asked by At

I have a huge xml file. I am giving just a sample here:

<?xml version="1.0"?>
<data yahoo="state">
<country name="Cambodia">
    <neighbor name="Thailand" direction="W"/>
    <neighbor name="Vietnam" towards="E"/>
</country>
<country name="Singapore">
    <neighbor name="Malaysia" dimension="N"/>
</country>
<country name="Panama">
    <neighbor name="Costa Rica" dimension="W"/>
    <neighbor name="Colombia" towards="E"/>
</country>
</data>

I have to get only those parent nodes (country) whose children nodes (neighbor) have the same value (W) of their attribute (direction) and (dimension). The output should look like:

[<country name="Cambodia">, <country name="Panama">]

as we can see in the output that we have a list of parent nodes of those child nodes that have same value of attributes, in this case (direction="W") and (dimension="W") have the same value. I am an absolute beginner. I am trying this (it is not correct, but just for your understanding of how I am trying):

from xml.dom import minidom
xmldoc = minidom.parse("C:/Users/Dsduda/Desktop/Countries.xml")
x = xmldoc.getElementsByTagName("neighbor")
y = xmldoc.getElementsByTagName("neighbor")
for s in x and t in y:
    if s.attributes["direction"].value == t.attributes["dimension"].value:
          print s.parentNode, t.parentNode
1

There are 1 best solutions below

0
On

Map your neighbor nodes in a dictionary, grouping them by their direction value:

from collections import defaultdict

by_direction = defaultdict(list)

# map directions
for neighbor in xmldoc.getElementsByTagName("neighbor"):
    if not 'direction' in neighbor.attributes:
        continue
    direction = neighor.attributes['direction'].value
    by_direction[direction].append(neighbor)

# look up by dimension:
for neighbor in xmldoc.getElementsByTagName("neighbor"):
    if not 'dimension' in neighbor.attributes:
        continue
    dimension = neighor.attributes['dimension'].value
    print neighbor.parentNode
    for node in by_direction[dimension]:
        print node.parentNode