Extraction issue with bold heading letters from pdf using tika

45 Views Asked by Glinty At 29 September 2023 at 00:13

I am new with reading text from pdf using python. I am using tika to extract content from pdf, and when it extracts bold headings, it seems to fail.

example image

In the example above, it's reads "Rating the Items" as following "RRaattiinngg tthhee IItteemms" and this happens with other headings as well, is it something to do with library I am using or the issue is with pdf itself.

Code I am using:

from tika import parser
raw=parser.from_file(config.PATH)
print(raw['content']

Are there better library for extracting text from pdf?
Thank You

Original Q&A

Extraction issue with bold heading letters from pdf using tika

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in APACHE-TIKA

Related Questions in PDF-EXTRACTION

Trending Questions

Popular # Hahtags

Popular Questions