I am trying to find a solution to convert a docx file to XHTML.
I found xdocreport, which looks good, but I have some issues. (and I am new to xdocreport)
According to their documentations on github here and here: I should be able to convert with this code:
String source = args[0];
String dest = args[1];
// 1) Create options DOCX to XHTML to select well converter form the registry
Options options = Options.getFrom(DocumentKind.DOCX).to(ConverterTypeTo.XHTML);
// 2) Get the converter from the registry
IConverter converter = ConverterRegistry.getRegistry().getConverter(options);
// 3) Convert DOCX to (x)html
try {
InputStream in = new FileInputStream(new File(source));
OutputStream out = new FileOutputStream(new File(dest));
converter.convert(in, out, options);
} catch (XDocConverterException | FileNotFoundException e) {
e.printStackTrace();
}
I am using these dependencies (tried different versions, like 2.0.2, 2.0.0, 1.0.6):
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.xdocreport.document.docx</artifactId>
<version>2.0.2</version>
</dependency>
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.xdocreport.template.freemarker</artifactId>
<version>2.0.2</version>
</dependency>
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.xdocreport.converter.docx.xwpf</artifactId>
<version>2.0.2</version>
</dependency>
My issues:
- The images are missing
- The background color is missing (all pages have a background color, which is not white and I have to convert this too)
How can I handle these issues? (Or how can I convert docx to xhtml using Docx4j with formats/numbering/images?)
To convert
*.docx
toXHTML
usingXDocReport
andapache poi
'sXWPFDocument
as the source you will needXHTMLOptions
. Those options are able havingImageManager
to set the path for extracted images fromXWPFDocument
. ThenXHTMLConverter
is needed to convert.Complete example:
This handles images properly.
But
XDocReport
is unable handling page background colors ofXWPFDocument
properly until now. It extracts and handles paragraph background colors but not page background colors.