Python re doesn't match last capture group

150 Views Asked by At

For the following code:

t1 = 'tyler vs ryan'
p1 = re.compile('(.*?) vs (.*?)')
print p1.findall(t1)

the output is:

[('tyler', '')]

but I would've expected this:

[('tyler', 'ryan')]

I have found that if I add a delimiter I can get it to work:

t2 = 'tyler vs ryan!'               # Notice the exclamation mark
p2 = re.compile('(.*?) vs (.*?)!')  # Notice the exclamation mark
print p2.findall(t2)

outputs:

[('tyler', 'ryan')]

Is there a way I can get my matches without having a custom delimiter?

5

There are 5 best solutions below

0
On BEST ANSWER

No. Try this

t1 = 'tyler vs ryan'
p1 = re.compile('(.*?) vs (.*?)$') 
print p1.findall(t1)

gives:

[('tyler', 'ryan')]

$ - Matches the end of the string or just before the newline at the end of the string, and in MULTILINE mode also matches before a newline.

1
On

(.*?) is non greedy it will match the smallest it can which is the empty string (after the vs at least)

try (.*) or ([^ ]*) or something

0
On

If you are assured of single-name combatants, you could use a regex like:

r'\s*(\S+)\s*vs\s*(\S+)\s*'

Your use of findall() implies to me you're expecting to have to match multiple pairings - if not, then you may want to use search() and use the ^ and $ regex special characters to more tightly bound your search.

0
On

The regex is capturing the shortest string it can; that's what the question mark signifies. So as soon as it has captured the text vs it captures an empty string, then stops. This is what it looks like:

Direct link: https://regex101.com/r/hO4lM7/2

If you use:

re.compile('(.*?) vs (.*)')

that is, without the 2nd question mark, it will capture the text after vs as well.

0
On

The non greedy ?is preventing to capture te second word. It would be better to do

r'(.*) vs (.*)'