How to only get the tag that has text with it in lxml

60 Views Asked by At

I'm using lxml and I have a xml like this:

<UploadFile>
<Eu>
<AUTO_ID>4</AUTO_ID>
<Meter>000413031</Meter>
</Eu>
</UploadFile>

How can I only get the tag that has text like AUTO_ID and Meter,but not UploadFile Eu?

I have tried:

    tree = lxml.etree.parse(xmlfile)
    root = tree.getroot()

for node in root.iter('*'):
    if node.text != None:
        print(node.tag,node.text)

But still I can get all the tags,I only want the tag has text with it,what can I do ?Any friend can help?Best regards!

2

There are 2 best solutions below

0
On

In your for loop, you can remove the spaces using strip() then check if len>0 or can check for none using if node.text.strip()

option 1:

import lxml
tree = lxml.etree.parse("my_xml.xml")
root = tree.getroot()

for node in root.iter('*'):
    if len(node.text.strip()) > 0: # check if len > 0, text will have some length
        print(node.tag,node.text)

option 2:

import lxml
tree = lxml.etree.parse("my_xml.xml")
root = tree.getroot()

for node in root.iter('*'): # checking if its None
    if node.text.strip():
        print(node.tag,node.text)
0
On

Unlike xml.etree, lxml supports more complex XPath expression including XPath that return all descendant elements that have child text node that isn't empty or white-space-only:

    for node in root.xpath(".//*[text()[normalize-space()]]"):
        print(node.tag,node.text)