Handling backreferences in re.sub when replacement includes numbers

23 Views Asked by At

Take the following simple regex replacement:

import re
s = "Python version is: 3.10"
pat = r'(is:.*)\d+\.\d+$'
version = "3.12"
result = re.sub(pat, rf'\1{version}', s)
print(result)

This fails with:

Traceback (most recent call last):
    ...
    raise s.error("invalid group reference %d" % index, pos)
re.error: invalid group reference 13 at position 1

What is happening is that the raw string interprets the backreference and includes the first "3" of the version string as its reference.

I've tried various iterations of:

re.sub(pat, rf'\1{version}', s)
re.sub(pat, f'\\1{version}', s)
re.sub(pat, r'\1' + version, s)
re.sub(pat, r'\1{0}'.format(version), s)
re.sub(pat, r'\1' + f"{version}", s)

But none will treat the string part as an actual string. Am I stuck using a named capture group for this?

1

There are 1 best solutions below

0
brandonscript On

Python's re.sub accepts a group indicator that you can use for both numbered and named capture groups:

import re
s = "Python version is: 3.10"
pat = r'(is:.*)\d+\.\d+$'
version = "3.12"
result = re.sub(pat, rf'\g<1>{version}', s)
print(result)
import re
s = "Python version is: 3.10"
pat = r'(?P<keep>is:.*)\d+\.\d+$'
version = "3.12"
result = re.sub(pat, rf'\g<keep>{version}', s)
print(result)