Convert PDF to HTML using pdfminer?

158 Views Asked by Cai Samuels At 17 January 2024 at 10:25

I am working on a project to convert multiple PDF files into basic HTML to put onto a site. I want to extract the text and the font sizes from the PDF to parse directly into HTML tags.

I have tried using pdfplumber however I am having trouble getting the font sizes to match up with the text so I am wondering if there is a simple method using pdfplumber or if there is another library that can achieve this.

Original Q&A

There are 1 best solutions below

Cem Koçak On 17 January 2024 at 13:45

You can use pdfminer.six (Python 3 compatible version of pdfminer)

from pdfminer.high_level import extract_text
from pdfminer.layout import LAParams, LTTextContainer

I am having trouble with coming up a code that works on a pdf on my pc that will also work on your pdf that I havent seen. I will include code If I can take a look at your pdf

Convert PDF to HTML using pdfminer?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PDF

Related Questions in TEXT-EXTRACTION

Related Questions in PDFPLUMBER

Trending Questions

Popular # Hahtags

Popular Questions