How to use pdfMiner in python to predicatbly read values

406 Views Asked by Jeff At 27 July 2025 at 12:52

I've been using pdfMiner to read values off of graphs and so far its been working great!

However there is one area in which the correct data is read correctly but in an unpredictable manner, meaning it will read all the graphs values correctly, in a completely different order than they appear.

This is not entirely a problem because as long as i know, say the last graph will always be read first, i can structure my program around that. Except it seems that pdfMiner is almost totally unpredicatable in the way it is reading this data, I can find no discernable pattern.

This is most probably because I am quite unfamiliar with pdfMiner so i am not entirely sure how it works. So yeah it would be really helpful if somone could just point me in the right direction.

Here is my data

And here is the conversion code i'm using:

from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
from cStringIO import StringIO
global values

print "Getting readable PDF"

rsrcmgr = PDFResourceManager()
retstr = StringIO()
codec = 'utf-8'
laparams = LAParams()
device = TextConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)
fp = file("graphExtraction.pdf", 'rb')
interpreter = PDFPageInterpreter(rsrcmgr, device)
password = ""
maxpages = 0
caching = True
pagenos=set()
for page in PDFPage.get_pages(fp, pagenos, maxpages=maxpages, password=password,caching=caching,           check_extractable=True):
    interpreter.process_page(page)
fp.close()
device.close()
str = retstr.getvalue()
retstr.close()
values = str

Original Q&A

There are 1 best solutions below

SymX On 07 June 2015 at 17:30

Use the bounding box information to follow the flow of your documents and figure out what comes first.

How to use pdfMiner in python to predicatbly read values

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PDFMINER

Related Questions in PDF-MANIPULATION

Trending Questions

Popular # Hahtags

Popular Questions