Python xml.etree - how to search for n-th element in an xml with namespaces?

101 Views Asked by At

EDIT

Looks like I wasn't clear enough below. The problem is that if I use node positions (eg. /element[1]) and namespaces, xpapth expressions do not work in xml.etree. Partially I found my answer - lxml handles them well, so I can use it instead of xml.etree, but leaving the question open for the future reference.

So to be clear, problem statement is: XPath expressions with positions and namespaces do not work in xml.etree. At least not for me.

Original question below:

I'm trying to use positions in xpath expressions processed by findall function of xml.etree.ElementTree.Element class. For some reason findall does not work with both namespaces and positions.

See the following example:

Works with no namespaces

>>> from xml.etree import ElementTree as ET
>>> xml = """
...             <element>
...                 <system_name>TEST</system_name>
...                 <id_type>tradeseq</id_type>
...                 <id_value>31359936123</id_value>
...             </element>
...         """
>>> root = ET.fromstring(xml)
>>> list = root.findall('./system_name')
>>> list
[<Element 'system_name' at 0x0000023825CDB9F0>]
>>> list[0].tag
'system_name'
>>> list[0].text
'TEST'
###Here is the lookup with position - works well, returns one element
>>> list = root.findall('./system_name[1]')  
>>> list
[<Element 'system_name' at 0x0000023825CDB9F0>]
>>> list[0].text
'TEST'

Does not work with namespaces

>>> xml = """
...             <element xmlns="namespace">
...                 <system_name>TEST</system_name>
...                 <id_type>tradeseq</id_type>
...                 <id_value>31359936123</id_value>
...             </element>
...         """
>>> root = ET.fromstring(xml)
>>> list = root.findall(path='./system_name', namespaces={'': 'namespace'})
>>> list
[<Element '{namespace}system_name' at 0x0000023825CDBD60>]
>>> list[0].text
'TEST'
###Lookup with position and namespace: I'm expecting here one element, as it was in the no-namespace example, but it returns empty list
>>> list = root.findall(path='./system_name[1]', namespaces={'': 'namespace'})
>>> list
[]   

Am I missing something, or is this a bug? If I should use any other library that better processes xml, could you name one, please?

1

There are 1 best solutions below

2
Hermann12 On

It works as in the doc defined: Please try this syntax:

ns = {'xmlns': 'namespace'}

for elem in root.findall(".//xmlns:system_name", ns):
    print(elem.tag)

Remark: even with empty key, but I assume this is not the correct usage.

ns = {'': 'namespace'}

for elem in root.findall(".//system_name", ns):
    print(elem.tag)

If you have only one namespace definition, you can also use {*}tag_name:

for elem in root.findall(".//{*}system_name"):
    print(elem.tag)

Also postional search of the child works fine:

ns = {'': 'namespace'}

for elem in root.findall("./system_name", ns):
    print(elem.tag)