FONT ISSUES WITH PDF TO HTML CONVERSION
- All "ti","fi","tt" characters are missing
- Font overlapping issue
- NOTE: I don't get this issue with firefox. Getting the above issues in chrome in safari browser
I AM USING
- Using the 0.13.6 version of pdf2htmlEX
- Using the following command to convert pdf to html
pdf2htmlEX --split-pages 1 --zoom 3 --fit-width 920 --correct-text-visibility 1 --dest-dir $1 $2 2>&1
TRIED
Using --fallback 1 option solves all my above problems. But
- The fallback option reduces the clarity of document.
- Table in the page disappears rather replaced with empty space.
DOUBTS
Could you please explain a bit more on fallback?
I have tried the above one (using fallback). Please suggest me if you prefer a different approach to solve the above problem with fonts.
Getting the above issues with chrome and safari whereas, in Firefox it is working fine.
The above issue occurs only in - webkit web browsers like chrome and safari - which provides support for ligatures - whereas browser like firefox does not.
Root cause
This issue with missing characters is due to ligature support provided by these modern browsers - let me explain how
1.The tool while converting - it converts characters to glyphs using poppler for rendering - now these browser when they come across characters like tt tf ti ff fi consider them to be ligature and searches for glyphs corresponding to tt and not t t
2.Since they do not have their corresponding glyphs - they just skip the characters and renders the rest - so, we fount the characters missing
Could be solved by
Disabling/ Turning-off the ligature in these browsers - embedding the css in the generating content
For more details please refer:
Please correct me if I am wrong.