etree.ElementTree parses xml which then builds a tree, is it an efficiently-searchable data structure?

Question

etree.ElementTree parses xml which then builds a tree, is it an efficiently-searchable data structure?

181 Views Asked by user1401950 At 17 August 2025 at 05:20

I have an XML string

<tags>
   <person1>dave jones</person1>
   <person2>ron matthews</person2>
   <person3>sally van heerden</person3>
   <place>tygervalley</place>
   <ocassion>shopping</ocassion>
</tags>

and I would like to search this xml string using search terms such as "Sally Van Heerden" or "Tygervalley"

Is it faster to use regex to find the terms in this string or is the find() method of Python fast enough? I can also search using the element tree XML parser for python and then build the XML tree then searching it but I fear it will be too slow.

Which of the above three is the fastest? Also any other suggestions?

Original Q&A

There are 2 best solutions below

**Denis** · Answer 1

Denis On 18 May 2012 at 08:47

I try to compare regexp and lxml for not large xml files and there was no strong differences between.

**Lev Levitsky** · Answer 2

The answer will really depend on what you are going to do with the search results. The only case when you should even consider not using an XML parser is when you don't remotely care about the XML document structure.

If this is the case, you can try timing all three, but building a tree is then not necessary and can take too much time to compete with the substring search.

Time all three to see the difference on a typical file for your problem. For instance, on your small example file:

$ python -m timeit "any('tygervalley' in line for line in open('t.xml'))"
100000 loops, best of 3: 14.6 usec per loop

$ python -m timeit "import re" "for line in open('t.xml'):" "    re.findall('tygervalley', line)"
10000 loops, best of 3: 27.4 usec per loop


$ python -m timeit "from lxml.etree import parse" "tree = parse('t.xml')" "tree.xpath('//*[text()=\'tygervalley\']')"
10000 loops, best of 3: 133 usec per loop

You can play around with the actual methods to call, there's always choice.

Edit: note how things change on a 100 times longer file:

$ python -m timeit "any('tygervalley' in line for line in open('t.xml'))"
100000 loops, best of 3: 20.8 usec per loop

$ python -m timeit "import re" "for line in open('t.xml'):" "    re.findall('tygervalley', line)"
1000 loops, best of 3: 252 usec per loop

$ python -m timeit "from lxml.etree import parse" "tree = parse('t.xml')" "tree.xpath('//*[text()=\'tygervalley\']')"
1000 loops, best of 3: 1.34 msec per loop

Be careful interpreting the results :)

etree.ElementTree parses xml which then builds a tree, is it an efficiently-searchable data structure?

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in XML-PARSING

Related Questions in STRING-SEARCH

Trending Questions

Popular # Hahtags

Popular Questions