i need to parse a PDF file with PDFBox (version 2.0.7), but i only get lots of warnings of the kind
Sep 02, 2017 10:18:24 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 WARNING: Found CFF/OTF but expected embedded TTF font AAAAAC+UniversLTStd-LightCn
Sep 02, 2017 10:18:24 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 INFO: OpenType Layout tables used in font AAAAAC+UniversLTStd-LightCn are not implemented in PDFBox and will be ignored
Do i have any possibility to solve that problem by e.g. loading a certain font before parsing that PDF or is there no chance that i can parse that document? Alternatively is there another PDF parsing framework i could try with better luck?
Thanks for any help.
I assume you used the ExtractText cmdline utility. The default behaviour of PDFBox' ExtractText utility is to write the extracted text into a file (same name as your input file but with suffix ".txt"). It does not show the extracted text on console. If you want to see the extracted text as console output, you have to specify the parameter "-console".