I'm currently working with the API Apache POI and I'm trying to edit a Word document with it (*.docx
). A document is composed by paragraphs (in XWPFParagraph
objects) and a paragraph contains text embedded in 'runs' (XWPFRun
). A paragraph can have many runs (depending on the text properties, but it's sometimes random). In my document I can have specific tags which I need to replace with data (all my tags follows this pattern <#TAG_NAME#>
)
So for example, if I process a paragraph containing the text Some text with a tag <#SOMETAG#>
, I could get something like this
XWPFParagraph paragraph = ... // Get a paragraph from the document
System.out.println(paragraph.getText());
// Prints: Some text with a tag <#SOMETAG#>
But if I want to edit the text of that paragraph I need to process the runs and the number of runs is not fixed. So if I show the content of runs with that code:
System.out.println("Number of runs: " + paragraph.getRuns().size());
for (XWPFRun run : paragraph.getRuns()) {
System.out.println(run.text());
}
Sometimes it can be like this:
// Output:
// Number of runs: 1
// Some text with a tag <#SOMETAG#>
And other time like this
// Output:
// Number of runs: 4
// Some text with a tag
// <#
// SOMETAG
// #>
What I need to do is to get the first run containing the start of the tag and the indexes of the following runs containing the rest of the tag (if the tag is divided in many runs). I've managed to get a first version of that algorithm but it only works if the beginning of the tag (<#
) and the end of the tag (#>
) aren't divided. Here's what I've already done.
So what I would like to get is an algorithm capable to manage that problem and if possible get it work with any given tag (not necessarily <#
and #>
, so I could replace with something like this {{{
and this }}}
).
Sorry if my English isn't perfect, don't hesitate to ask me to clarify any point you want.
Finally I found the answer myself, I totally changed my way of thinking my original algorithm (I commented it so it might help someone who could be in the same situation I was)
So now I'm processing all runs of the paragraph until I found my surrounded tag, then I'm processing back the paragraph to ignore the first runs if I don't need to edit them.