What is the state of generating Word documents in 2021? (Language agnostic)

143 Views Asked by At

Okay, so this is a pretty generic and vague question, so please let me elaborate.

We have a large codebase which we are splitting up the past years to more individual self-contained libraries.

One of the larger and more unwieldy parts is our Word export module. It uses docx4j currently, however we run into memory issues with large exports with a lot of pictures. Besides that, it is pretty difficult to update the exporter due to changes in our domain model.

It has been a while since someone worked on it (like years...) so I took it upon myself to investigate the state of generating Word documents in 2021. I hoped a lot had changed, but some Google searches let me to posts of 2010, and libraries of 2012. Of course, it can be the case that a library of 2012 means it is just that good.

I have identified the following solutions, though I am probably missing a lot:

  • Docx4j (JVM), still maintained, we run into memory problems with that.
  • Docx4j with Content Control Data Binding. Seems to be some way to use templating?
  • Apache POI (JVM), have some okay experience with the Excel part, no experience with the Word part. The 'consensus' online appears to be that Docx4j is more user-friendly.
  • JasperReports. Don't know anything about that.
  • DocX, .NET library, no experience.
  • Office Add-In using Office.js (JS). Official API from Microsoft. Runs at client in Word, so required connection to an API.
  • docxtemplates (Node / Browser). No experience. Looks complete, don't know about performance though.
  • officegen (Node). Last release 2019.
  • Carbone (node). https://github.com/Ideolys/carbone. No experience also.
  • probably more...

So, as expected a lot of libraries in JS popping up as well.

Looking at my requirements:

  • using a template would be nice
  • running it as a service would be nice
  • efficient (memory wise, don't mind if it takes some time to generate)

We have quite a good JSON API available, which is very easy to maintain and maps pretty good to our domain model. My preference would be to use that as a source of course.

what are peoples experiences and/or am I missing some very good libraries out there?

0

There are 0 best solutions below