Tabula CalledProcessError: returned non-zero exit status 2. Tried everything possible

2.5k Views Asked by At

I keep getting this error while using Tabula on python.

I've gone through EVERY stackoverflow question related to this and blogs as well.

My JDK JRE is up to date.

java version "1.8.0_161" Java(TM) SE Runtime Environment (build 1.8.0_161-b12) Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)

My path is correctly defined in Environment variables.

Python version running on Anaconda.

Python 3.6.5 |Anaconda, Inc

df = tabula.read_pdf("C:\XXXXX\PDFExtractor\Test.pdf")

I've tried with encoding as well.

Tabula CalledProcessError:  Command '['java', '-jar', 'C:\\Users\\xxxxx\\AppData\\Local\\Continuum\\anaconda3\\lib\\site-packages\\tabula\\tabula-1.0.1-jar-with-dependencies.jar', '--pages', '1', '--guess', 'C:\\Users\\xxxxxx\\PDFExtractor\\Test.pdf']' returned non-zero exit status 2.

Appreciate the help.

2

There are 2 best solutions below

0
On

You need to escape backslashes or use a raw string:

df = tabula.read_pdf("C:\\XXXXX\\PDFExtractor\\Test.pdf")

or

df = tabula.read_pdf(r"C:\XXXXX\PDFExtractor\Test.pdf")

otherwise your file is seen as C:XXXXXPDFExtractorTest.pdf

0
On

I've found the error. I basically ran java -jar 'C:\Users\xxxxx\AppData\Local\Continuum\anaconda3\lib\site-packages\tabula\tabula-1.0.1-jar-with-dependencies.jar' 'C:\Users\xxxxxx\PDFExtractor\Test.pdf' on the command line. it throws and error

But if I replace the ' with the " then it give me the output of the parsed pdf on the command line.

java -jar "C:\Users\xxxxx\AppData\Local\Continuum\anaconda3\lib\site-packages\tabula\tabula-1.0.1-jar-with-dependencies.jar" 'C:\Users\xxxxxx\PDFExtractor\Test.pdf'

Now How do i get python to pass the first part in double quotes?