Convert PDF to HTML using pdfminer?

158 Views Asked by At

I am working on a project to convert multiple PDF files into basic HTML to put onto a site. I want to extract the text and the font sizes from the PDF to parse directly into HTML tags.

I have tried using pdfplumber however I am having trouble getting the font sizes to match up with the text so I am wondering if there is a simple method using pdfplumber or if there is another library that can achieve this.

1

There are 1 best solutions below

4
Cem Koçak On

You can use pdfminer.six (Python 3 compatible version of pdfminer)

from pdfminer.high_level import extract_text
from pdfminer.layout import LAParams, LTTextContainer

I am having trouble with coming up a code that works on a pdf on my pc that will also work on your pdf that I havent seen. I will include code If I can take a look at your pdf