TL;DR: How to get re.sub to print out what substitutions it makes, including when using groups?
Kind of like having a verbose option, is it possible to have re.sub print out a message every time it makes a replacement? This would be very helpful for testing how multiple lines of re.sub is interacting with large texts.
I've managed to come up with this workaround for simple replacements utilizing the fact that the repl argument can be a function:
import re
def replacer(text, verbose=False):
def repl(matchobj, replacement):
if verbose:
print(f"Replacing {matchobj.group()} with {replacement}...")
return replacement
text = re.sub(r"[A-Z]+", lambda m: repl(m, "CAPS"), text)
text = re.sub(r"\d+", lambda m: repl(m, "NUMBER"), text)
return text
replacer("this is a 123 TEST 456", True)
# Log:
# Replacing TEST with CAPS...
# Replacing 123 with NUMBER...
# Replacing 456 with NUMBER...
However, this doesn't work for groups--it seems re.sub automatically escapes the return value of repl:
def replacer2(text, verbose=False):
def repl(matchobj, replacement):
if verbose:
print(f"Replacing {matchobj.group()} with {replacement}...")
return replacement
text = re.sub(r"([A-Z]+)(\d+)", lambda m: repl(m, r"\2\1"), text)
return text
replacer2("ABC123", verbose=True) # returns r"\2\1"
# Log:
# Replacing ABC123 with \2\1...
Of course, a more sophisticated repl function can be written that actually checks for groups in replacement, but at that point that solution seems too complicated for the goal of just getting re.sub to report out on substitutions. Another potential solution would be to just use re.search, report out on that, then use re.sub to make the replacement, potentially using the Pattern.sub variant in order to specify pos and endpos to save the sub function from having to search the whole string again. Surely there's a better way than either of these options?
Use
matchobj.expand(replacement)which will process the replacement string and make the substitutions:Output:
A generic example that extends
re.subwith a verbose option and allows group patterns to be used by replacement functions:Output: