Python XML Processing in minidom

2k Views Asked by At

I have the following very simple XML file and I want to quickly parse the imsi elements for each device using minidom.

    <device>
      <imsi>01010101</imsi>  
    </device>
    <device>
      <imsi>123456</imsi>
    </device>
    <device>
      <imsi>9876543</imsi>
    </device>

    --------------------------
    CODE SNIPPET FOR PARSING
    --------------------------

    doc = xml.dom.minidom.parse("./input.xml")

    for node in doc.getElementsByTagName("device"):
          imsi = node.getElementsByTagName("imsi")
          print str(imsi)

When I execute the above code snippet I get the below error in the terminal. What improvement you think I should bring into the above code to parse imsi elements for each device? Thanks.

doc = xml.dom.minidom.parse("./input.xml")
File "/usr/lib/python2.6/site-packages/_xmlplus/dom/minidom.py", line 1915, in parse
return expatbuilder.parse(file)
File "/usr/lib/python2.6/site-packages/_xmlplus/dom/expatbuilder.py", line 926, in parse
result = builder.parseFile(fp)
File "/usr/lib/python2.6/site-packages/_xmlplus/dom/expatbuilder.py", line 207, in  parseFile
parser.Parse(buffer, 0)
xml.parsers.expat.ExpatError: junk after document element: line 4, column 0

After I introduced a root node I wrote the following code which ended up with a weird output. What do you think is wrong here?

     doc = xml.dom.minidom.parse("./input.xml") 
     for node in doc.getElementsByTagName("device"):
         imsi = node.getElementsByTagName("imsi") 
         print str(imsi) 

    [<DOM Element: imsi at 0x828636c>] 
    [<DOM Element: imsi at 0x82864ac>]  
    [<DOM Element: imsi at 0x828660c>] 

Following code solved my problem and oprinted the IMSI elements properly:

    for node in doc.getElementsByTagName("device"):
        imsi = node.getElementsByTagName("imsi")
        for a in imsi:
                Title= a.firstChild.data
                print Title
2

There are 2 best solutions below

0
On BEST ANSWER

Your sample is not a valid XML document because it has no root node. Insert one to get something like

<devices>
  <device>
    <imsi>01010101</imsi>  
  </device>
  <device>
    <imsi>123456</imsi>
  </device>
  <device>
    <imsi>9876543</imsi>
  </device>
</devices>
1
On

Your xml is not valid. Insert a root node in your xml.

You can check for validity here W3C Markup Validator.

<document>
    <device>
      <imsi>01010101</imsi>  
    </device>
    <device>
      <imsi>123456</imsi>
    </device>
    <device>
      <imsi>9876543</imsi>
    </device>
</document>

If you want your xml to be completely valid than add document type declaration to it.