re.findall giving different results to re.compile.regex

70 Views Asked by At

Why does re.compile.findall not find "um" if "um" is at the beginning of the string (it works fine is "um" isn't at the beginning of the string, as per the last 2 lines below)

>>> s = "um"
>>> re.findall(r"\bum\b", s, re.IGNORECASE)
['um']
>>> re.compile(r"\bum\b").findall(s, re.IGNORECASE)
[]
>>> re.compile(r"\bum\b").findall(s + " foobar", re.IGNORECASE)
[]
>>> re.compile(r"\bum\b").findall("foobar " + s, re.IGNORECASE)
['um']

I would have expected the two options to be identical. What am I missing?

1

There are 1 best solutions below

3
Tim Peters On BEST ANSWER

You intended to pass re.IGNORECASE to the compile() function, but in the failing cases you're actually passing it to the findall() method. There it's interpreted as an integer giving the starting position for the search to begin. Its value as an integer isn't defined, but happens to be 2 today:

>>> int(re.IGNORECASE)
2

Rewrite the code to work as intended, and it's fine; for example:

>>> re.compile(r"\bum\b", re.IGNORECASE).findall(s + " foobar") # pass to compile()
['um']

As originally written, it can't work unless "um" starts at or after position 2:

>>> re.compile(r"\bum\b").findall(" " + s, re.IGNORECASE)
[]
>>> re.compile(r"\bum\b").findall("  " + s, re.IGNORECASE) # starts at 2
['um']