I have the below xml structure, and I need to get data in below format to be exported to spreadsheet Output to be written in spreadsheet
Image of expected spreadsheet output
Source XML Data:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root testAttr="testValue">
<family name="Hardwood">
<child name="Jack">First</child>
<child name="Rose">Second</child>
<child name="Blue Ivy">Third
<grandchildren>
<data>One</data>
<data>Two</data>
<unique>Twins</unique>
</grandchildren>
</child>
<child name="Jane">Fourth</child>
</family>
<family name="Downie">
<child name="Bill">First</child>
<child name="Rosie">Second</child>
<child name="Edward">
Third
</child>
<child name="Jane">Fourth</child>
</family>
</root>
I have tried the below however not able to enter into grandchild tag.
import xml.dom.minidom
doc=xml.dom.minidom.parse('Sample_XML.xml')
children = doc.getElementsByTagName('family')
for child in children:
print(child.getAttribute('name'))
print(child.getElementsByTagName('child')[0].childNodes[0].nodeValue)
print(child.getElementsByTagName('child')[1].childNodes[0].nodeValue)
print(child.getElementsByTagName('child')[2].childNodes[0].nodeValue)
print(child.getElementsByTagName('child')[2].childNodes[0].nodeValue)
There may be a simpler way of doing it, but it can be done if you use lxml instead of minidom, because of the former's better support of xpath. lxml parser the document and creates a dataframe which can be saved as-is, as csv or as an Excel file. It's not simple (because your xml is deeply nested), but if pretty self explanatory, I believe, if you know xpath. If not, you should read up on it...
Output should look like this (pardon the formatting):