Parsing a whole tag by lxml.html

192 Views Asked by At

I'm new to lxml and want to parse a page retrieved by "requests". My html is this:

<html>
<body>
<h1 class="entry-title">
    <a href="http://a.com" rel="bookmark">
    bla bla bla
    </a>
</h1>
</body>
</html>

and I want to have a string that looks like this:

"""<h1 class="entry-title">
    <a href="http://google.com" rel="bookmark">
    bla bla bla
    </a>
</h1>"""

what would be the code in python 3.4?

1

There are 1 best solutions below

2
On BEST ANSWER

try something like this:

from lxml.html import document_fromstring
from lxml.html import tostring
doc = document_fromstring(YOUR_HTML_STRING)
h1 = tostring(doc.xpath("//h1")[0])