How to extract text from pdf with complex layouts using python?

126 Views Asked by deepak At 17 February 2024 at 14:56

I am extracting text from pdf but it's hard to extract for the complex layouts like a 2-column pdf and different scenarios of pdf's in a table like table with borders or no borders, and combined scenarios like table in a two column or adjacent table . it is getting hard when all layouts are in one pdf. Is there a way to overcome this issue of extracting text from pdf without loosing its structure.

I tried by getting it's layout dictionary of a pdf using PyMuPDF with it's co-ordinates or bbox but i couldn't differenciate between different layouts of pdf.

Original Q&A

How to extract text from pdf with complex layouts using python?

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in PDF

Related Questions in EXTRACT

Related Questions in TEXT-EXTRACTION

Related Questions in PDFTOTEXT

Trending Questions

Popular # Hahtags

Popular Questions