Iterate through attribute values with minidom

1.4k Views Asked by At

I have some xml that looks like this:

<topic>
    <restrictions>
        <restriction id="US"/>
        <restriction id="CA"/>
        <restriction id="EU"/>
    </restrictions>
</topic>
<topic>
    <restrictions>
        <restriction id="JP"/>
        <restriction id="AU"/>
        <restriction id="EU"/>
        <restriction id="US"/>
    </restrictions>
</topic>

And different iterations with the same pattern. I'm already using minidom in my script to do some other things with the xml. For the example above I need to get as result the following:

[['US','CA','EU'],['JP','AU','EU','US']]

I have tried different iterations with the incorrect result. This is my code:

from xml.dom import minidom

xmldoc = minidom.parse(path_to_file)
itemlist = xmldoc.getElementsByTagName('restrictions')
itemlist2 = xmldoc.getElementsByTagName('restriction')


restrictions=[]

for x in itemlist:
    res=[]
    for s in itemlist2:
        res.append(s.attributes['id'].value)

    restrictions.append(res)

print(restrictions)

Can you please help me to get the iteration correctly? Any help is appreciated. Thanks!

EDIT: Just realized something else might happen that I need to account for just in case. It can also happen that a topic element does not have a element at all, and when that happens, the value appended to the list should be just 0. What is an easy way to make that condition?

1

There are 1 best solutions below

3
longhua On BEST ANSWER

getElementsByTagName returns all elements with the corresponding tag name. So itemlist2 contains all restriction notes in the XML. In your code, it will add all these nodes ['US','CA','EU','JP','AU','EU','US'] for each restrictions node. So you should try to get restriction nodes for each restrictions node separately in the loop.

from xml.dom import minidom

xmldoc = minidom.parse(path_to_file)
restrictions=[]
topic_nodes = xmldoc.getElementsByTagName('topic')
for topic_node in topic_nodes:
  restrictions_nodes = topic_node.getElementsByTagName('restrictions')
  if not restrictions_nodes:
      restrictions.append(0)
      continue

  result = []
  for restrictions_node in restrictions_nodes:
      restriction_nodes = restrictions_node.getElementsByTagName('restriction')
      for restriction_node in restriction_nodes:
          result.append(restriction_node.attributes['id'].value)

  restrictions.append(result)

print(restrictions)