Converting PDF Table from URL into a Pandas Dataframe?

79 Views Asked by At

Having issues converting PDF data into a dataframe depending on how the PDF is uploaded to the website.

Hi all,

Does anyone have any ideas on how to read an uploaded PDF's data into a pandas dataframe? I am having issues doing it with certain PDFs.

For example, with this url https://www.rrc.texas.gov/media/ep0le0dv/2022-january-01-0692.pdf, i was able to easily get the data like so:

import tabula as tb
pdf_url = 'https://www.rrc.texas.gov/media/ep0le0dv/2022-january-01-0692.pdf'
tb.read_pdf(pdf_url, pages = 1, guess = True)

but for other links where I cannot highlight values on the PDF (it looks just faxed in), like this url https://rrc.texas.gov/media/uzzdihmq/2023-july-10-0026.pdf, I get stuck. I have tried using tabula, pdfplumber, pytesseract so far, but with no success

Does anyone have any ideas? Thanks in advance!

0

There are 0 best solutions below