I am parsing an xml document using the lxml library. There is a new line character (\n) in the attribute value:
from lxml import etree
root = etree.fromstring('<root attr1="line1\nline2"/>')
print(etree.tostring(root).decode())
Result:
<root attr1="line1 line2"/>
That is, the parser replaces the newline character with a space. Is there any way to leave the newline character in the attribute value when parsing?
I know you can add a newline character when creating the xml:
from lxml import etree
root = etree.Element('root', attr1='line1\nline2')
print(root.attrib['attr1'])
print(etree.tostring(root).decode())
Result:
line1
line2
<root attr1="line1 line2"/>
But how to do it when parsing?
Update
The behaviour seems to depend on the OS. The described problem is relevant for Windows, I checked my example on Linux and it appears that the newline characters are preserved. It remains to be seen if there is a way to disable the conversion of the newline character to a space on Windows?
I think the below can help: