I use Pdfplumber to extract the table on page 2, section 3 (normally). But it only works on some pdf, others do not work. For failed pdf files, it seems like Pdfplumber read the button table instead of the table I want.
How can I get the table? link of the pdf which doesn't work: pdfA
link of the pdf which works: pdfB
Here is my code:
import pdfplumber
pdf = pdfplumber.open("/Users/chueckingmok/Desktop/selenium/Shell Omala 68.pdf")
page = pdf.pages[1]
table=page.extract_table()
import pandas as pd
df = pd.DataFrame(table[1:], columns=table[0])
df
But the table I want in page 2 is
However, this code works for pdfB (which I mentioned above).
Btw, the table I want in each pdf is in section 3.
Anyone can help?
Many thanks Joan
Updated: I just found a good package to extract pdf file without any problems. the package is fitz, and it also names as PyMuPDF.
Hey Here is the proper solution for that problem but first please read some of my points below
Below Is the solution of your problem,
See the output of the Above Code