I need to wrap all plain text chunks with paragraphs, but there could a nested paragraph which should be skipped. How would I tackle this?
I'm having difficult time understanding how to wrap some plain-text into one paragraph while skipping existing paragraphs.
Given XML:
<section xmlns="http://www.w3.org/1999/xhtml">
<div>
test test
<p>test</p>
<ins>INS</ins>
text
</div>
</section>
Expected Result:
<section xmlns="http://www.w3.org/1999/xhtml">
<div>
<p>test test</p>
<p>test</p>
<p>
<ins>INS</ins>
text
</p>
</div>
</section>
Here is an approach using a simple recursive algorithm to effectively partition the div content by p nodes
yields the following:
Your expected result is not consistent with regard to leading/trailing whitespace in the text nodes. It is not clear if you really expect to achieve the exact result presented where whitespace is normalized for some text and not for other text. Probably not.
To normalize whitespace in all text nodes replace this:
with:
which yields:
The algorithm here works when there is no p or multiple p, but I did not test every scenario.
In XQuery 3, this can be simplified with tumbling windows, for example:
Similarly to normalize space, replace:
with something like