Python - Getting the text of a link with Etree using Xpath

391 Views Asked by Py47 At 31 August 2022 at 01:02

I'm trying to get the text "Former United States Secretary Of State" out of this tag. I've tried many ways but cannot seem to get it.

<div class="tag"><a href="en/profession/748/former-united-states-secretary-of-state" class="">Former United States Secretary Of State</a></div>

This is my code:

site_content = etree.HTML(result)
selection = site_content.xpath(xpath_select)
content = [item.strip() for item in selection]

Every other xpath is working. This is the xpath I'm using as there are multiple of this one tag on the page "/html/body/div[5]/div[4]/div[5]/div[*]"

Any help in right direction would be greatly appreciated.

Working url = https://www.blackandwhitequotes.com/en/quotes/william-jennings-bryan_1182154_1&key=2OP8jfJC1D

Original Q&A

There are 1 best solutions below

Granitosaurus On 31 August 2022 at 05:47 BEST ANSWER

Your XPath doesn't seem to be valid for your HTML example.

In general when building XPaths it's best to rely on classes and identifiers rather than tree structure. So, we should write //div[contains(@class,"tag")] instead of //div/div/div[0] etc.

In your case you can also use //text() XPath function to select all of the inner text of your node:

from lxml import etree

html = """<div class="tag"><a href="en/profession/748/former-united-states-secretary-of-state" class="">Former United States Secretary Of State</a></div>"""
tree = etree.HTML(html)
print(tree.xpath("//div[contains(@class,'tag')]//text()")[0])
#'Former United States Secretary Of State'

Looking for a div with class of tag will be much more reliable way of parsing this HTML than /html/body/div[5]/div[4]/div[5]/div[*]

Python - Getting the text of a link with Etree using Xpath

There are 1 best solutions below

Related Questions in HTML

Related Questions in PYTHON-3.X

Related Questions in WEB-SCRAPING

Related Questions in XPATH

Related Questions in XML.ETREE

Trending Questions

Popular # Hahtags

Popular Questions