I am trying to use the library pdf2image in a Code Repository on Palantir Foundry and getting the error
pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?
when using the function convert_from_bytes.
Does anyone know how to reference the poppler path and get rid of this error?
Thanks!
Here is the code:
def extract_pdf_text(input_bytes, language='eng', dpi=200):
pages = convert_from_bytes(input_bytes, dpi)
pdf_pages = ''
for page_index, page in enumerate(pages):
pdf_page = pytesseract.image_to_string(page, lang=language)
pdf_pages = pdf_pages + pdf_page
return pdf_pages
And the meta.yaml for reference:
# If you need to modify the runtime requirements for your package,
# update the 'requirements.run' section in this file
package:
name: "{{ PACKAGE_NAME }}"
version: "{{ PACKAGE_VERSION }}"
source:
path: ../src
requirements:
# Tools required to build the package. These packages are run on the build system and include
# things such as revision control systems (Git, SVN) make tools (GNU make, Autotool, CMake) and
# compilers (real cross, pseudo-cross, or native when not cross-compiling), and any source pre-processors.
# https://docs.conda.io/projects/conda-build/en/latest/resources/define-metadata.html#build
build:
- python 3.8.*
- setuptools
# Packages required to run the package. These are the dependencies that are installed automatically
# whenever the package is installed.
# https://docs.conda.io/projects/conda-build/en/latest/resources/define-metadata.html#run
run:
- python 3.8.*
- transforms {{ PYTHON_TRANSFORMS_VERSION }}
- transforms-expectations
- transforms-verbs
- pytesseract
- pdfplumber
- googletrans
- regex
- pdf2image
- langdetect
- pandas
- numpy
- selenium
- requests
- pypdf2
- poppler
build:
script: python setup.py install --single-version-externally-managed --record=record.txt
I found the problem when inspecting the CI-Checks. They failed before poppler was pulled. After I cleaned up meta.yaml and the checks succeded everything seems to work fine.