I am working with a program that modifies the content and structure of an XML document. My program has a set of class methods, each of which modifies a different XML tag. I have configured my configuration file to change the content of five different tags, and it changes the content on those correctly and without problems.

My problem is that the program removes also :

  1. "Unnecessary" whitespaces from some tags,
  2. Tag closings if there is no text content between them (Someone else will add the text content later).

Example:

None of these are targeted in the configuration.

- <prodDate date="2001-11-12" />
- <timePrd event="single" date="2000-01-01" />
- <collDate event="start" date="2000-01-01" />
- <collDate event="end" date="2000-01-01" />
+ <prodDate date="2001-11-12"/>
+ <timePrd event="single" date="2000-01-01"/>
+ <collDate event="start" date="2000-01-01"/>
+ <collDate event="end" date="2000-01-01"/>


- <complete></complete>
- <dataSrc></dataSrc>
+ <complete/>
+ <dataSrc/>

Is this a feature (keeping XML Document somewhat cleaner) built into the lxml module? If so, is there a chance to disable it?


I'm doing diff check for thousands of XML files (Modified <-> Original) and this causes lot of unwanted entries in this log files.

0

There are 0 best solutions below