When I'm manipulating XML parsed with the Python LXML module (specifically lxml.objectify
, but I don't think it makes a difference), how can I preserve CDATA?
Given the following results:
>>> from lxml import objectify, etree
>>> xml = '''
<Root>
<Child>
<![CDATA[abcd]]>
</Child>
</Root>
'''
>>> parser = objectify.makeparser(strip_cdata=False)
>>> parsed = objectify.XML(xml, parser=parser)
>>> etree.tostring(parsed)
'<Root><Child><![CDATA[abcd]]></Child></Root>'
>>> type(parsed.Child)
<type 'lxml.objectify.StringElement'>
>>> parsed.Child.text
'abcd'
>>> parsed.Child = 'efgh'
>>> etree.tostring(parsed)
'<Root><Child xmlns:py="http://codespeak.net/lxml/objectify/pytype" py:pytype="str">efgh</Child></Root>'
I'd like that last line to still have the <![CDATA[....]>
. But I can't see any way of either preserving it or recreating it. Attempts to access the content of the <Child>
element produce a bare string, and modifying the content of that element magically disappears the CDATA section.
What's the right way of doing this?