python feedparser

1.8k Views Asked by At

How would you parse xml data as follows with python feedparser

<Book_API>
<Contributor_List>
<Display_Name>Jason</Display_Name>
</Contributor_List>
<Contributor_List>
<Display_Name>John Smith</Display_Name>
</Contributor_List>
</Book_API>
2

There are 2 best solutions below

0
On BEST ANSWER

As Lennart Regebro mentioned, it seems not a RSS/Atom feed but just XML document. There are several XML parsing facilities (SAX and DOM both) in Python standard libraries. I recommend you ElementTree. Also lxml is best one (which is drop-in replacement of ElementTree) in third party libraries.

try:
    from lxml import etree
except ImportError:
    try:
        from xml.etree.cElementTree as etree
    except ImportError:
        from xml.etree.ElementTree as etree

doc = """<Book_API>
<Contributor_List>
<Display_Name>Jason</Display_Name>
</Contributor_List>
<Contributor_List>
<Display_Name>John Smith</Display_Name>
</Contributor_List>
</Book_API>"""
xml_doc = etree.fromstring(doc)
0
On

That doesn't look like any sort of RSS/ATOM feed. I wouldn't use feedparser at all for that, I would use lxml. In fact, feedparser can't make any sense of it and drops the "Jason" contributor in your example.

from lxml import etree

data = <fetch the data somehow>
root = etree.parse(data)

Now you have a tree of xml objects. How to do it in lxml more specifically is impossible to say until you actually give valid XML data. ;)