I am new with reading text from pdf using python. I am using tika to extract content from pdf, and when it extracts bold headings, it seems to fail.
In the example above, it's reads "Rating the Items" as following "RRaattiinngg tthhee IItteemms" and this happens with other headings as well, is it something to do with library I am using or the issue is with pdf itself.
Code I am using:
from tika import parser
raw=parser.from_file(config.PATH)
print(raw['content']
Are there better library for extracting text from pdf?
Thank You