I currently have a program that generates a .docx document using the python-docx library.
Upon completing the building of the .docx file I save it into a Bytestream as so
file_stream = io.BytesIO()
document.save(file_stream)
file_stream.seek(0)
Now, I need to convert this word document into a PDF. I have looked at a few different libraries for conversion such as docx2pdf or even doing it manually using comtypes as so
import sys
import os
import comtypes.client
wdFormatPDF = 17
in_file = "Input_file_path.docx"
out_file = "output_file_path.pdf"
word = comtypes.client.CreateObject('Word.Application')
doc = word.Documents.Open(in_file)
doc.SaveAs(out_file, FileFormat=wdFormatPDF)
doc.Close()
word.Quit()
The problem is, I need to do this conversion in memory and cannot physically save the DOCX or the PDF to the machine. Every converter I've seen requires a filepath to the physical document on the machine and I do not have that.
Is there a way I can convert the DOCX filestream into a PDF stream just in memory?
Thanks
This method is a little convoluted, but it works entirely in memory, and you get the option to add custom CSS to style the final document.
Convert the DOCX bytestream to HTML using mammoth, and the resulting HTML to PDF using pdfkit.
Here's an example
If you want to output the stream to a file, just do