Python, lxml.html: Need a generic funtion to return innerhtml of any element

Question

Python, lxml.html: Need a generic funtion to return innerhtml of any element

86 Views Asked by spacedog At 26 June 2025 at 15:42

I found a nice function here by Siva Kannan but its not working in my case. I'm using lxml.html to get the data from the page and not etree. When I use etree I get the exception:

lxml.etree.XMLSyntaxError: error parsing attribute name

Below is his example modified to first get data from a yellowpages page, then attempt to get innerhtml from a specific div tag

Any help would be great and should help a many people.

Thank you

from lxml import etree
import requests, time, socket
import lxml.html as lxml

def innerXML(elem):
    elemName = elem.xpath('name(/*)')
    resultStr = ''
    for e in elem.xpath('/'+ elemName + '/node()'):
        if(isinstance(e, str) ):
            resultStr = resultStr + ''
        else:
            resultStr = resultStr + etree.tostring(e, encoding='unicode')

    return resultStr

# This works nicely but for my data
# XMLElem = etree.fromstring("<div>I am<xxxxxx>Jhon <last.xxxxx> Corner</last.xxxxx></xxxxxx>.I    work as <job>software engineer</job><end meta='bio' />.</div>")
# print(innerXML(XMLElem))

response = requests.get('https://www.yellowpages.com/washington-dc/mip/bnsf-railway-496598824')
data = response.text
# The next line is how I need to get data for all my work.
# tree = lxml.fromstring(data)

# Siva Kannan's way
tree = etree.fromstring(data)
div_node = tree.xpath("//dd[@class='open-hours']")
# div_node = tree.xpath("//dd[@class='open-hours']//div")  # When using lxml.fromstring (my normal    code) this returns a list when using 

div_html = innerXML(div_node)
print(div_html)

Original Q&A

There are 1 best solutions below

**Forensic_07** · Answer 1

Erase from lxml import etree, replace etree.tostring with lxml.tostring, and replace etree.fromstring with lxml.fromstring.

As a side note, this code will also produce an error because div_node will be a list of nodes rather than a node, but that should be easy to fix.

Python, lxml.html: Need a generic funtion to return innerhtml of any element

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in XPATH

Related Questions in LXML.HTML

Trending Questions

Popular # Hahtags

Popular Questions