Iterate through attribute values with minidom

1.4k Views Asked by At

I have some xml that looks like this:

<topic>
    <restrictions>
        <restriction id="US"/>
        <restriction id="CA"/>
        <restriction id="EU"/>
    </restrictions>
</topic>
<topic>
    <restrictions>
        <restriction id="JP"/>
        <restriction id="AU"/>
        <restriction id="EU"/>
        <restriction id="US"/>
    </restrictions>
</topic>

And different iterations with the same pattern. I'm already using minidom in my script to do some other things with the xml. For the example above I need to get as result the following:

[['US','CA','EU'],['JP','AU','EU','US']]

I have tried different iterations with the incorrect result. This is my code:

from xml.dom import minidom

xmldoc = minidom.parse(path_to_file)
itemlist = xmldoc.getElementsByTagName('restrictions')
itemlist2 = xmldoc.getElementsByTagName('restriction')


restrictions=[]

for x in itemlist:
    res=[]
    for s in itemlist2:
        res.append(s.attributes['id'].value)

    restrictions.append(res)

print(restrictions)

Can you please help me to get the iteration correctly? Any help is appreciated. Thanks!

EDIT: Just realized something else might happen that I need to account for just in case. It can also happen that a topic element does not have a element at all, and when that happens, the value appended to the list should be just 0. What is an easy way to make that condition?

1

There are 1 best solutions below

3
On BEST ANSWER

getElementsByTagName returns all elements with the corresponding tag name. So itemlist2 contains all restriction notes in the XML. In your code, it will add all these nodes ['US','CA','EU','JP','AU','EU','US'] for each restrictions node. So you should try to get restriction nodes for each restrictions node separately in the loop.

from xml.dom import minidom

xmldoc = minidom.parse(path_to_file)
restrictions=[]
topic_nodes = xmldoc.getElementsByTagName('topic')
for topic_node in topic_nodes:
  restrictions_nodes = topic_node.getElementsByTagName('restrictions')
  if not restrictions_nodes:
      restrictions.append(0)
      continue

  result = []
  for restrictions_node in restrictions_nodes:
      restriction_nodes = restrictions_node.getElementsByTagName('restriction')
      for restriction_node in restriction_nodes:
          result.append(restriction_node.attributes['id'].value)

  restrictions.append(result)

print(restrictions)