I saw a lot of people had similar issues, but not this one. And many of the similar issues do not have an applicable solution, unfortunately.
I am getting this warning from tabula. And when I look at the result or test the length of what it extracts, there is nothing there. Here is the message:
Got stderr: Apr 12, 2022 5:34:12 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>
WARNING: Using fallback font 'Helvetica-Oblique' for 'CenturyGothic-Italic'
All I am using is:
table = tabula.read_pdf(pdf_path, pages= page, multiple_tables = True)
Any ideas??
The correct approach, would be to install the missing fonts as recommended in the answer here: Using fallback font while parsing file content using pdfbox - can it cause mistakes?
However, for my application, which is reading pdf files from a docker container, installing extra fonts in the OS might be unnecessary. Because what you see in the logs are a warning, the missing fonts do not really impact the parsing of the PDF.
To remove these warnings from any logging in tabula.py I just added
silent=True
to the arguments in the method call as follows: