I'm trying to get the text "Former United States Secretary Of State" out of this tag. I've tried many ways but cannot seem to get it.
<div class="tag"><a href="en/profession/748/former-united-states-secretary-of-state" class="">Former United States Secretary Of State</a></div>
This is my code:
site_content = etree.HTML(result)
selection = site_content.xpath(xpath_select)
content = [item.strip() for item in selection]
Every other xpath is working. This is the xpath I'm using as there are multiple of this one tag on the page "/html/body/div[5]/div[4]/div[5]/div[*]"
Any help in right direction would be greatly appreciated.
Working url = https://www.blackandwhitequotes.com/en/quotes/william-jennings-bryan_1182154_1&key=2OP8jfJC1D
Your XPath doesn't seem to be valid for your HTML example.
In general when building XPaths it's best to rely on classes and identifiers rather than tree structure. So, we should write
//div[contains(@class,"tag")]instead of//div/div/div[0]etc.In your case you can also use
//text()XPath function to select all of the inner text of your node:Looking for a
divwith class oftagwill be much more reliable way of parsing this HTML than/html/body/div[5]/div[4]/div[5]/div[*]