I'm using regex in a python script to capture a named group. The group occurs before OR after a delimiter string "S". My confusion comes from an inability to use named capturing groups twice in the same regex.
I'd like to use the following invalid (named group used twice) regex:
(?:^STD_S_)(?P<important>.+?)$|(?:^(?P<important>.+?)(?:_S_STD)$
Description:
?: non-capture group ^STD_S_ Starting with some "STD_S_" string which is a standard string plus a delimiter
?P Named important string I want
| OR
^?P stat with important _S_STD$ end with standard
I would really like the important group I capture to be named. I can remove the names and get this to work. I can also split the single expression into two expressions (one from each side of the OR) and search choose which one to use with some login in the python script.
Thanks!
EXAMPLE INPUTS
STD_S_important
important_S_STD
EXAMPLE OUTPUTS
important #returned by calling the important named group
important
regex based on comments that doesn't match the second case.
(?:(?:^STD_S_)(?P<important>.+?)$)|(?:^(?P=important)(?:_S_STD)$)
Note the general form of the regex is:
A(?P<name>B)|(?P<name>B)C
. Since a name can't be repeated for named groups, it must go around the whole expression. This causes another issue: it captures the prefix and suffix in the named group. To resolve this, you can use lookarounds to prevent the prefix and suffix from being captured within the group.Note that this only works when the prefix is of fixed length. If part of the prefix or suffix themselves should be captured, you can add capturing groups to the lookarounds. Anchors cannot be placed next to the lookarounds but must instead be put in them, else they will create mutually exclusive requirements.
For the regex in question, this gives:
(RegEx101 demo)
Alternatively, the regex module allows the same group name to be used for multiple groups, with the last capture taking precedence.