How to get a href attribute value in xml content (atom feed)?

688 Views Asked by At

I'm saving the content (atom feed / xml content) from a get request as content = response.text and the content looks like this:

<feed xmlns="http://www.w3.org/2005/Atom">
    <title type="text">title-a</title>
    <subtitle type="text">content: application/abc</subtitle>
    <updated>2021-08-05T16:29:20.202Z</updated>
    <id>tag:tag-a,2021-08:27445852</id>
    <generator uri="uri-a" version="v-5.1.0.3846329218047">abc</generator>
    <author>
        <name>name-a</name>
        <email>email-a</email>
    </author>
    <link href="url-a" rel="self"/>
    <link href="url-b" rel="next"/>
    <link href="url-c" rel="previous"/>
</feed>

How can I get the value "url-b" of the href attribute with rel="next" ?

I tried it with the ElementTree module, for example:

from xml.etree import ElementTree

response = requests.get("myurl", headers={"Authorization": f"Bearer {my_access_token}"})
content = response.text

tree = ElementTree.fromstring(content)

tree.find('.//link[@rel="next"]')
// or
tree.find('./link').attrib['href']

but that didn't work.

I appreciate any help and thank you in advance.

If there is an easier, simpler solution (maybe feedparser) I welcome that too.

2

There are 2 best solutions below

0
balderman On BEST ANSWER

How can I get the value "url-b" of the href attribute with rel="next" ?

see below

from xml.etree import ElementTree as ET

xml = '''<feed xmlns="http://www.w3.org/2005/Atom">
    <title type="text">title-a</title>
    <subtitle type="text">content: application/abc</subtitle>
    <updated>2021-08-05T16:29:20.202Z</updated>
    <id>tag:tag-a,2021-08:27445852</id>
    <generator uri="uri-a" version="v-5.1.0.3846329218047">abc</generator>
    <author>
        <name>name-a</name>
        <email>email-a</email>
    </author>
    <link href="url-a" rel="self"/>
    <link href="url-b" rel="next"/>
    <link href="url-c" rel="previous"/>
</feed>'''

root = ET.fromstring(xml)
links = root.findall('.//{http://www.w3.org/2005/Atom}link[@rel="next"]')
for link in links:
    print(f'{link.attrib["href"]}')

output

url-b
6
zx485 On

You can use this XPath-1.0 expression:

./*[local-name()="feed"]/*[local-name()="link" and @rel="next"]/@href

This should result in "url-b".