I would like to match a text that is surrounded by one of two allowed symbols (let's say & and #). Whichever of the two symbols is used before the text, should follow after the text; the second symbol option is not allowed (eg. &Time& and #Time# are valid but &Time# is not). I would like to try using lookbehind and lookforward for this by capturing the first symbol in a group. But when I try to do this, the lookbehind and lookahead parts are also included in the match. Is it possible to extract just the text using lookbehind and lookahead with backreference?

r"(?<=(&|#))([A-Za-z]+)(?=(\1))" matches all the string &Hawai&#Rome# instead of extracting Hawai and Rome

1

There are 1 best solutions below

0
JvdV On

In your current pattern you are using a 3rd, unnecessary, capture group. You could use (?<=[$#])([A-Za-z]+)(?=\1).

However, since findall() would return all capture groups within Python, I think you might as well just scratch the lookarounds and reference the 2nd capture group using a list comprehension like so:

([&#])([A-Za-z]+)\1

See an online demo. In code:

import re
s = '&Hawai&#Rome#'
l = [x[1] for x in re.findall(r'([&#])([A-Za-z]+)\1', s)]
print(l)

Prints:

['Hawai', 'Rome']