Convert DOCX Bytestream to PDF Bytestream Python

2.8k Views Asked by johnstoia At 04 September 2020 at 22:09

I currently have a program that generates a .docx document using the python-docx library.

Upon completing the building of the .docx file I save it into a Bytestream as so

file_stream = io.BytesIO()
document.save(file_stream)
file_stream.seek(0)

Now, I need to convert this word document into a PDF. I have looked at a few different libraries for conversion such as docx2pdf or even doing it manually using comtypes as so

import sys
import os
import comtypes.client

wdFormatPDF = 17

in_file = "Input_file_path.docx"
out_file = "output_file_path.pdf"

word = comtypes.client.CreateObject('Word.Application')
doc = word.Documents.Open(in_file)
doc.SaveAs(out_file, FileFormat=wdFormatPDF)
doc.Close()
word.Quit()

The problem is, I need to do this conversion in memory and cannot physically save the DOCX or the PDF to the machine. Every converter I've seen requires a filepath to the physical document on the machine and I do not have that.

Is there a way I can convert the DOCX filestream into a PDF stream just in memory?

Thanks

Original Q&A

There are 2 best solutions below

Matias Agelvis On 16 November 2021 at 21:33 BEST ANSWER

This method is a little convoluted, but it works entirely in memory, and you get the option to add custom CSS to style the final document.
Convert the DOCX bytestream to HTML using mammoth, and the resulting HTML to PDF using pdfkit.

Here's an example

# create a dummy docx file
from docx import Document
document = Document()
document.add_paragraph('Lorem ipsum dolor sit amet.')

# create a bytestream
import io
file_stream = io.BytesIO()
document.save(file_stream)
file_stream.seek(0)

# convert the docx to html
import mammoth
result = mammoth.convert_to_html(file_stream)

# >>> result.value
# >>> '<p>Lorem ipsum dolor sit amet.</p>'

# convert html to pdf
import pdfkit
pdf = pdfkit.from_string(result.value)

If you want to output the stream to a file, just do

with open('test.pdf','wb') as file:
    file.write(pdf)

Revisto On 04 September 2020 at 22:15

i think you want to convert docx file to pdf file , then follow these steps:

Install the package:

pip install docx2pdf

Code:

from docx2pdf import convert

convert("input.docx")
convert("input.docx", "output.pdf")
convert("my_docx_folder/")

Thanks and if I am mistaken please tell me to correct it

Convert DOCX Bytestream to PDF Bytestream Python

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PDF

Related Questions in IO

Related Questions in DOCX

Related Questions in BYTESTREAM

Trending Questions

Popular # Hahtags

Popular Questions