Poppler Installation on Google Colab

1.5k Views Asked by At

I am trying to convert pdf to image using pdf2image module on Google Colab. I have downloaded the latest version of poppler and also installed poppler-utils. In convert_from_path() , I mentioned the correct path to poppler's bin directory, still I'm getting FileNotFoundError and PDFInfoNotInstalled Error.

Refer to the attached Screenshot for more clarity. Screenshot of Error

2

There are 2 best solutions below

0
On

For colab, try installing the below commands and try convert_from_path

#Libraries to be installed
!sudo apt-get update
!apt-get install poppler-utils

Then try pages = convert_from_path('filename', 500)

It should work.

I recommend this answer as it worked for me best than other answers. If it doesn't work even after installing the library, then try restarting the kernel and run the code convert_from_path. It should work now.

0
On

AFAIK, Google colab is running a Ubuntu operating system, you can discover that by running the uname -a command.

If you build poppler, the pdf* binaries are installed in /usr/bin and pdf2image can resolve them automatically.

Discover the operating system name.

!uname -a;
Linux d9b9a62155f2 5.10.133+ #1 SMP Fri Aug 26 08:44:51 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
!cat requirements.txt
pdf2image

Install python dependencies

!pip install -r requirements.txt

Install some dependencies for building poppler

!apt update
!apt-get install libnss3 libnss3-dev
!apt-get install libcairo2-dev libjpeg-dev libgif-dev
!apt-get install cmake libblkid-dev e2fslibs-dev libboost-all-dev libaudit-dev

Download and extract poppler source code.

!wget https://poppler.freedesktop.org/poppler-21.09.0.tar.xz;
!tar -xvf poppler-21.09.0.tar.xz;

Compile and install poppler.

!mkdir -p poppler-21.09.0/build && \
cd poppler-21.09.0 && \
cmake  -DCMAKE_BUILD_TYPE=Release   \
       -DCMAKE_INSTALL_PREFIX=/usr  \
       -DTESTDATADIR=$PWD/testfiles \
       -DENABLE_UNSTABLE_API_ABI_HEADERS=ON && \
make && \
make install

Work with the PDF file

from pdf2image import convert_from_path, convert_from_bytes

images = convert_from_path('sample.pdf', poppler_path='/usr/bin/')