I'm extracting the text from a WordExtractor
class (apache POI), but I have an error for some .doc
files. Debugging, I saw that the line with the problem is the last one here:
HWPFDocument docx = new HWPFDocument(new FileInputStream(file));
WordExtractor we = new WordExtractor(docx);
String T = we.getText().replaceAll("\\n", " ").replaceAll("\\r", " ");
For most .docx
and .doc
files it's work fine.
The error message is:
Exception in thread "main" java.lang.RuntimeException:
java.lang.IllegalArgumentException: The end (4958) must not be before the start (4990)
How can I fix it?
XWPFWordExtractor from docs:
So this is your problem :) And solution from their docs: