I have a model which generates output in the form of numpy arrays, text and plots. It currently holds this output as a dictionary.
There are requirements for the output to be presented in a variety of formats. Particularly, PDF, Word and Excel.
My solution has been to write all data to an HTML string, and export the HTML to a PDF using weasyprint. I would then export the table sections of the HTML to Excel. This works okay, but it's messy.
I was wondering if there was an easier way to do this? In my mind, perhaps there is a module which would allow you to store the information in a dictionary, and dictate its data type, then a process would handle it's formatting and exporting to various formats.
I wanted to answer my own question, to demonstrate what I implemented as a solution.
Because the data formats were multimedia (text, numbers, plots), I made two approaches:
Report
class which had the capability of exporting txt, html, docx and pdfWorkbook
class, which had the capability of export xlsx and csvBoth classes inherited the same data structure, which was a nested dictionary containing numbers, and metadata. The
Report
class then grabbed additional text, and created plots from the data.For example, the data resembled this structure:
The
Report
class built an HTML string using Dominate, and could either export as HTML by rendering this, PDF by feeding the rendered HTML into WeasyPrint, or to Docx (or some other format theoretically) by converting the rendered HTML to Docx via PyPandocThe
Workbook
class iterated through dictionaries of values and wrote groups of these values to Pandas dataframes, and exported them to a workbook usingpd.ExcelWriter
. The same dataframes could be exported to csv, and compressed into a zip file using an adapted solution found here.