I have a list of words which I am searching in a pdf document using fitz in python The code generally works for most of the words except for a few like "efficiency"
My code is given below :
if (len(re.findall(f'\\b{phrase.casefold()}s?\\b', mpage.casefold(), flags=0))>0) :
text_instances = page.search_for(phrase, quads=True)
This code works for mostly all words except for some words e.g. efficiency For the word "efficiency", the if statement successfully matches but the page.search_for statement does not match The word efficiency given in the image below has different fonts for first and second f Is it because of this that the word is not matched?
I got the solution. In order to disregard ligatures, we should set flags = 0 as
This link helped me finding the solution https://github.com/pymupdf/PyMuPDF/issues/1503
Thanks to @jorj-mickie https://stackoverflow.com/users/4474869/jorj-mckie