Different behaviour of re.search function in Python

36 Views Asked by At

I have come accross a different behaviour of search function in regex which made me think that there is an implicit \b anchor in the pattern. Is this the case?

<code>
text = "bowl"

print(re.search(r"b|bowl", text)) # first alteration in this pattern works
print(re.search(r"o|bowl", text)) # but  first alteration won't work here
print(re.search(r"w|bowl", text)) # nor here
print(re.search(r"l|bowl", text)) # nor here
print(re.search(r"bo|bowl", text)) # first alteration in this pattern works
print(re.search(r"bow|bowl", text)) # first alteration in this pattern works
</code>
<br />

OUTPUT

<re.Match object; span=(0, 1), match='b'>
<re.Match object; span=(0, 4), match='bowl'>
<re.Match object; span=(0, 4), match='bowl'>
<re.Match object; span=(0, 4), match='bowl'>
<re.Match object; span=(0, 2), match='bo'>
<re.Match object; span=(0, 3), match='bow'>

I have researched that if this was the case but I couldn't find any explanation.

1

There are 1 best solutions below

2
On BEST ANSWER

I'm not a regex expert, so I'll use simple words to describe what happens internally.

search works from left to right, and the | patterns too. Also search is different from match and moves forward to try to find the pattern across the string, not just at start.

Take this:

re.search(r"o|bowl", text)

So if o pattern is tested against, since matcher is on b character of the input string, it doesn't match, and the code tries the second pattern. If it failed, it would skip to next character (since all match possibilities are exhausted) and would match o, but since it matches, it doesn't happen: bowl characters are consumed.

If you try:

re.search("o|bar", text)

then o will be matched.

Note that it's not specific to python. That's how a correct regex engine works.

If you want the alternate behaviour you could write:

re.search("o", text) or re.search("bar", text)