I am trying to parse XML and am hard time having. I dont understand why the results keep printing [<Element 'Results' at 0x105fc6110>]
I am trying to extract Social
from my example with the
import xml.etree.ElementTree as ET
root = ET.parse("test.xml")
results = root.findall("Results")
print results #[<Element 'Results' at 0x105fc6110>]
# WHAT IS THIS??
for result in results:
print result.find("Social") #None
the XML looks like this:
<?xml version="1.0"?>
<List1>
<NextOffset>AAA</NextOffset>
<Results>
<R>
<D>internet.com</D>
<META>
<Social>
<v>http://twitter.com/internet</v>
<v>http://facebook.com/internet</v>
</Social>
<Telephones>
<v>+1-555-555-6767</v>
</Telephones>
</META>
</R>
</Results>
</List1>
findall
returns alist
ofxml.etree.ElementTree.Element
objects. In your case, you only have 1Result
node, so you could usefind
to look for the first/unique match.Once you got it, you have to use
find
using the.//
syntax which allows to search in anywhere in the tree, not only the one directly underResult
.Once you found it, just
findall
onv
tag and print the text:results in:
note that I did not perform validity check on the xml file. You should check if the
find
method returnsNone
and handle the error accordignly.Note that even though I'm not confident myself with xml format, I learned all that I know on parsing it by following this lxml tutorial.