searching for matching words in pdf using page.searc_for

80 Views Asked by vani At 06 June 2025 at 22:12

I have a list of words which I am searching in a pdf document using fitz in python The code generally works for most of the words except for a few like "efficiency"

My code is given below :

        if (len(re.findall(f'\\b{phrase.casefold()}s?\\b', mpage.casefold(), flags=0))>0) :
        
             text_instances = page.search_for(phrase, quads=True)

This code works for mostly all words except for some words e.g. efficiency For the word "efficiency", the if statement successfully matches but the page.search_for statement does not match The word efficiency given in the image below has different fonts for first and second f Is it because of this that the word is not matched?

Original Q&A

There are 1 best solutions below

vani On 18 December 2023 at 11:30

I got the solution. In order to disregard ligatures, we should set flags = 0 as

text_instances = page.search_for(phrase,flags = 0, quads=True)

This link helped me finding the solution https://github.com/pymupdf/PyMuPDF/issues/1503

Thanks to @jorj-mickie https://stackoverflow.com/users/4474869/jorj-mckie

searching for matching words in pdf using page.searc_for

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PATTERN-MATCHING

Related Questions in MATCHING

Related Questions in STRING-MATCHING

Related Questions in PYMUPDF

Trending Questions

Popular # Hahtags

Popular Questions