iText not rendering Chinese/Korean characters

3.6k Views Asked by At

I have a HTML String with Chinese/Korean characters. I want to convert the HTML to PDF using iText. I have read that we need to embed the FONT to the PDF to get the unicode characters to show up on PDF.

When I am trying to embed wts11.ttf (With encoding IDENTITY_H) or STSong-Light( with encodingUniGB-UCS2-H), I am able to see only Chinese characters but I cannot see Korean characters. I tried using arialuni.ttf (With encoding IDENTITY_H) but still can see only Chinese characters but not Korean.

Can someone please tell me what should be exact font. Or if I am missing something.

Below is the code snippet:

Document document = new Document();
Paragraph paragraph=new Paragraph();
PdfWriter.getInstance(document, baos);
document.open();
BaseFont bff = BaseFont.createFont("STSong-Light", "UniGB-UCS2-H", BaseFont.EMBEDDED);
Font f = new Font(bff);

// FontFactory.registerDirectories(); 
// Font f = FontFactory.getFont("Arial Unicode MS", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);

document.add(new Paragraph());
HTMLWorker htmlWorker = new HTMLWorker(document);

List<Element> objects=htmlWorker.parseToList(new StringReader(message),null);
paragraph.setFont(f);
for (Element elem : objects) {
    paragraph.add(elem);
}
document.add(paragraph);
2

There are 2 best solutions below

0
On
  1. dowload Malgun-Gothic-Bold_29380.ttf font.
  2. store it in asset->fonts->Malgun-Gothic-Bold_29380.ttf font
  3. this code will work for cjk and English and vitenames

Font fontbold=FontFactory.getFont("assets/fonts/Malgun-Gothic-Bold_29380.ttf", BaseFont.IDENTITY_H,BaseFont.EMBEDDED, 12);

0
On

There are different ways to solve this problem if you upgrade to using XML Worker.

I reused the code from the official examples, more specifically the ParseHtmlAsian example, and I adapted the HTML that is used as the source for this example like this:

<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    </head>
    <body>
    <p><span style="font-size:12.0pt; font-family:MS Mincho">長空</span>
    <span style="font-size:12.0pt; font-family:Times New Roman,serif">(Broken Sword),</span>
    <span style="font-size:12.0pt; font-family:MS Mincho">秦王殘劍</span>
    <span style="font-size:12.0pt; font-family:Times New Roman,serif">(Flying Snow),</span>
    <span style="font-size:12.0pt; font-family:MS Mincho">飛雪</span>
    <span style="font-size:12.0pt; font-family:Times New Roman,serif">(Moon), </span>
    <span style="font-size:12.0pt; font-family:MS Mincho">如月</span>
    <span style="font-size:12.0pt; font-family:Times New Roman,serif">(the King), and</span>
    <span style="font-size:12.0pt; font-family:MS Mincho">秦王</span>
    <span style="font-size:12.0pt; font-family:Times New Roman,serif">(Sky).</span></p>
    <p style="font-size: 12.0pt; font-family:Batang">빈집</p>
    <p>Test</p>
    </body>
</html>

The result looks like this:

enter image description here

As you can see, all the text is rendered correctly, so please do not spread incorrect messages such as "iText not rendering Chinese/Korean characters" ;-)

Please forward this answer to your management so that your CTO understands that investing time in an old iText version is more expensive than buying a license to use the new iText version.