Parse PDF Fails Because of Missing OpenType Layout Tables

2.1k Views Asked by At

i need to parse a PDF file with PDFBox (version 2.0.7), but i only get lots of warnings of the kind

Sep 02, 2017 10:18:24 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 WARNING: Found CFF/OTF but expected embedded TTF font AAAAAC+UniversLTStd-LightCn

Sep 02, 2017 10:18:24 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 INFO: OpenType Layout tables used in font AAAAAC+UniversLTStd-LightCn are not implemented in PDFBox and will be ignored

Do i have any possibility to solve that problem by e.g. loading a certain font before parsing that PDF or is there no chance that i can parse that document? Alternatively is there another PDF parsing framework i could try with better luck?

Thanks for any help.

1

There are 1 best solutions below

0
On

I assume you used the ExtractText cmdline utility. The default behaviour of PDFBox' ExtractText utility is to write the extracted text into a file (same name as your input file but with suffix ".txt"). It does not show the extracted text on console. If you want to see the extracted text as console output, you have to specify the parameter "-console".