How can I differentiate regular whitespaces and escaped ones ( ) when parsing XML with xml.etree.ElementTree (python)

338 Views Asked by Grisha S At 19 December 2013 at 07:14

I'm using xml.etree.ElementTree to parse an XML file. How can I force it to either strip text of whitespaces (just regular spaces, not  ) or leave spaces and ignore escapes (leave them as is)? Here is my problem:

xml_text = """
<root>
    <mytag>
        data_with_space&#32;
    </mytag>
</root>"""
root = xml.etree.ElementTree.fromstring(xml_text)
mytag = root.find("mytag")
print "original text: ", repr(mytag.text)
print "stripped text: ", repr(mytag.text.strip())

It prints:

original text:  '\n        data_with_space \n    '
stripped text:  'data_with_space'

What I need:

'data_with_space '

or (which I can escape by other means):

'data_with_space&#32;'

A solution using xml.etree.ElementTree is preferable because I'd have to rewrite a whole lot of code otherwise

Original Q&A

There are 1 best solutions below

Timothy On 19 December 2013 at 09:12 BEST ANSWER

The standard XML library treats   and ' ' as equal. There's no way to avoid the equalization if you directly apply fromstring(xml_text), and therefore it's impossible to differentiate them then. The only way to stop the escaping is to translate it into something else before apply fromstring(), and translate it back after then.

import xml.etree.ElementTree

stop_escape   = lambda text: text.replace("&#", "|STOP_ESCAPE|")
resume_escape = lambda text: text.replace("|STOP_ESCAPE|", "&#")

xml_text = """
<root>
    <mytag>
        data_with_space&#32;
    </mytag>
</root>"""
root = xml.etree.ElementTree.fromstring(stop_escape(xml_text))
mytag_txt = resume_escape(root.find("mytag").text)
print "original text: ", repr(mytag_txt)
print "stripped text: ", repr(mytag_txt.strip())

You would get:

original text:  '\n        data_with_space&#32;\n    '
stripped text:  'data_with_space&#32;'

How can I differentiate regular whitespaces and escaped ones ( ) when parsing XML with xml.etree.ElementTree (python)

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in XML

Related Questions in ESCAPING

Related Questions in HTML-ESCAPE-CHARACTERS

Related Questions in XML.ETREE

Trending Questions

Popular # Hahtags

Popular Questions

How can I differentiate regular whitespaces and escaped ones (&#32;) when parsing XML with xml.etree.ElementTree (python)

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in XML

Related Questions in ESCAPING

Related Questions in HTML-ESCAPE-CHARACTERS

Related Questions in XML.ETREE

Trending Questions

Popular # Hahtags

Popular Questions

How can I differentiate regular whitespaces and escaped ones ( ) when parsing XML with xml.etree.ElementTree (python)