Unexpected results parsing XML in Python

42 Views Asked by At

I'm trying to parse the following text from the XML

title_text = word1 Word2 word3 word4

The problem is that with the code below I'm getting title_text = 'word1'.

How can I achieve that?

XML:

<response>...<results>...<grouping>...<group>...
    <doc>...
         <title>
             word1
             <hlword>Word2</hlword>
             <hlword>word3</hlword>
             word4
          </title>
          ...
    </doc>
</group>...</grouping>...</results>...</response>...

Code for parse:

from lxml import objectify
...
tree = objectify.fromstring(xml)
nodes = tree.response.results.grouping.group
for node in nodes:
    title_element = node.doc.title
    title_text = title_element.text
    print title_text
1

There are 1 best solutions below

0
On BEST ANSWER

Just iterate over .itertext():

>>> for node in nodes:
...    print(' '.join(node.doc.title.itertext()))
...
word1 word2 word3 word4