I am using xhtml2pdf library to convert my webpage into an pdf document. The library function was working fine, however since the webpage contents is an Khmer unicode, the produced pdf document content was not correctly display.
Here is the result pdf file:
The correct content should be
ភាសាខ្មែរ
នេះគឺជាអត្ថបទសាមញ្ញជាភាសាខ្មែរដែលបានសរសេរជាអក្សរខ្មែរយូនីកូដ។
Here is my python code to convert to pdf file:
from xhtml2pdf import pisa
import io
def convert_html_to_pdf(source_html, output_filename):
with open(output_filename, "wb") as output_file:
pdf_status = pisa.CreatePDF(source_html, dest=output_file)
return not pdf_status.err
if __name__ == "__main__":
with io.open("khmer_unicode.html", "r", encoding="utf-8") as html_file:
html_content = html_file.read()
pdf_output = "khmer_unicode.pdf"
if convert_html_to_pdf(html_content, pdf_output):
print "PDF file has been created at "+pdf_output
else:
print("Error generating PDF")
and my html file:
<!DOCTYPE html>
<html>
<title>sample file</title>
<head>
<meta charset="UTF-8">
<style>
@font-face {
font-family: 'KhmerOS';
src: url('KhmerOS.ttf');
}
body {
font-family: 'KhmerOS', Arial, sans-serif;
}
</style>
</head>
<body>
<h1>ភាសាខ្មែរ</h1>
<p>នេះគឺជាអត្ថបទសាមញ្ញជាភាសាខ្មែរដែលបានសរសេរជាអក្សរខ្មែរយូនីកូដ។</p>
</body>
</html>
The font file 'KhmerOS.tff' is in the same directory.
Do I have to encode anything more to make the Khmer Unicode character to display correctly in the converted file? Please kindly help, thanks.

Reading "Fonts and encodings" you might have to use the
pdfmetricsmodule from thereportlabpackage to register the font first, before using it withxhtml2pdf:That code registers the
KhmerOS.ttffont withreportlabwhich should then be available forxhtml2pdfto use when creating the PDF.If the Khmer Unicode characters are still not displaying correctly, it could be an issue with the font file itself.
That means your current process pipeline need to change.
You can try and convert the text to a format that is known to work well with
xhtml2pdf. For example, you could try converting the Khmer Unicode text to HTML entities.Or you might consider using a PDF creation library (like
reportlab) directly, without going through HTML. You could create a PDF template and then insert your text into that template.But you can also try a Different HTML to PDF Library, like
pdfkitorweasyprint. These libraries may handle Unicode characters more reliably.Example using
pdfkitlibrary:To use
pdfkit, you will need to installwkhtmltopdf.Then install the
pdfkitPython package withpip install pdfkit.