lookbehind for start of string or a character

2.8k Views Asked by At

The command

re.compile(ur"(?<=,| |^)(?:next to|near|beside|opp).+?(?=,|$)", re.IGNORECASE)

throws a

sre_constants.error: look-behind requires fixed-width pattern

error in my program but regex101 shows it to be fine.

What I'm trying to do here is to match landmarks from addresses (each address is in a separate string) like:

  • "Opp foobar, foocity" --> Must match "Opp foobar"
  • "Fooplace, near barplace, barcity" --> Must match "near barplace"
  • "Fooplace, Shoppers Stop, foocity"--> Must match nothing
  • "Fooplace, opp barplace"--> Must match "opp barplace"

The lookbehind is to avoid matching words with opp in them (like in string 3).

Why is that error thrown? Is there an alternative to what I'm looking for?

2

There are 2 best solutions below

0
On BEST ANSWER
re.compile(ur"(?:^|(?<=[, ]))(?:next to|near|beside|opp).+?(?=,|$)", re.IGNORECASE)

You can club 3 conditions using [] and |.See demo.

https://regex101.com/r/vA8cB3/2#python

2
On

Use re.findall with the below regex, since re.findall must return the contents insdie the capturing group if there is any capturing group presents.

re.compile(ur"(?m)(?:[, ]|^)((?:next to|near|beside|opp).+?)(?:,|$)", re.IGNORECASE)