Borderless pdf extraction to json is not working properly for Python camelot library

383 Views Asked by Goutam Ghosh At 25 June 2025 at 17:59

Can anyone give me quick answer/help that as we are facing some issue after pdf extraction to json using python camelot is not giving exact content. some content is missing after extraction.

Original Q&A

There are 1 best solutions below

Stefano Fiorucci - anakin87 On 24 September 2020 at 13:40

I tried the following code:

import camelot

pdf_path = '/YOUR/FILEPATH.pdf'
tables = camelot.read_pdf(pdf_path, flavor='stream')

Here are two problems:

headers font is not properly read, so you find strange characters like (cid:71)...
using flavor='lattice', the table isn't detected. Using flavor='stream', the table is detected, but the cells aren't properly detected.

At the moment, I think that Camelot can't properly extract this table. They are working on fixing the second problem (see this and this).

Borderless pdf extraction to json is not working properly for Python camelot library

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PDF-EXTRACTION

Related Questions in PYTHON-CAMELOT

Trending Questions

Popular # Hahtags

Popular Questions