grabbing value from html by xpath in python

88 Views Asked by At

I want to use xpath for grabbing WhatIwant phrase:

a="<b>AAA:</b> BBB<br/><br/><img src='line.gif' /><br/><br/><b><font size='2'>Text: </b>WahtIwant</font><br/><center>"

I want to grab WahtIwant from a:

tree=html.fromstring(a)
tree.xpath('//font[@size="2"]/text()')
['Text: ']
2

There are 2 best solutions below

0
On BEST ANSWER

Using lxml and tail property (text that directly follows the element) of the element.

>>> import lxml.html
>>> 
>>> a = "<b>AAA:</b> BBB<br/><br/><img src='line.gif' /><br/><br/><b><font size='2'>Text: </b>WahtIwant</font><br/><center>"
>>> root = lxml.html.fromstring(a)
>>> [x.tail for x in root.xpath('//font[@size="2"]/parent::b')]
['WahtIwant']
0
On

In xpath point of view, the text you want is following-sibling of the <b> element that is parent of font[@size="2"] :

tree.xpath('//font[@size="2"]/parent::b/following-sibling::text()')

or, you can use xpath that select <b> element having child font with size attribute equals 2, and then select text node following that <b> :

tree.xpath('//b[font/@size="2"]/following-sibling::text()')