Why is Python .title() sticky? Applying title case to future strings?

167 Views Asked by At

This is a strange one:

I'm trying to format a string in python (obviously), and when I use string.title(), it seems like python keeps applying title case to the string, even after applying other formatting to the string.

Here's my code:

    def format_trade_name(self):
        tr_name = self.trade_name.title()
        tr_cap = [
            'oxy',
            'depo',
            'edex',
            'emla',
            'pred',
        ]
        tr_join = '|'.join(tr_cap)   
        tr_regex = r'\b(?!(' +tr_join + r'))(\w{1,4})\b'
        tr_matches = re.search(tr_regex, self.trade_name,re.IGNORECASE)
        for i in tr_matches.groups():
            if i is not None:
                tr_name = re.sub(r'\b'+i+r'\b',i.upper(),tr_name)
        return tr_name

Here's the problem: I want the function to capitalize the first letter of each word, and then convert all 4 letter strings (not in tr_cap) to upper case. So if the original string is tylenol depo er, I want the formatted string to be Tylenol Depo ER

When I change the 2nd line to tr_name = self.trade_name.capitalize(), my function changes tylenol depo er into Tylenol depo ER (depo is not capitalized).

When I keep the 2nd line as as written, tr_name = self.trade_name.title(), my function changes tylenol depo er to Tylenol Depo Er (Er isn't upper case, even though the formatting was applied after using .title().

Can anyone explain to me why the string is being converted to Title Case, even after I try to apply new formatting?

UPDATE So I fixed it, but I have no idea why it works. I feel like there's some important principle that I'm missing.

When I change tr_matches = re.search(tr_regex, self.trade_name,re.IGNORECASE) to tr_matches = re.search(tr_regex, tr_name, re.IGNORECASE) it works.

So, this works:

    def format_trade_name(self):
        tr_name = self.trade_name.title()
        tr_cap = [
            'oxy',
            'depo',
            'edex',
            'emla',
            'pred',
        ]
        tr_join = '|'.join(tr_cap)   
        tr_regex = r'\b(?!(' +tr_join + r'))(\w{1,4})\b'
        tr_matches = re.search(tr_regex, tr_name ,re.IGNORECASE)
        for i in tr_matches.groups():
            if i is not None:
                tr_name = re.sub(r'\b'+i+r'\b',i.upper(),tr_name)
        return tr_name

Any ideas why?

1

There are 1 best solutions below

1
Tim Nyborg On

You're not using your title-cased string throughout the function:

tr_matches = re.search(tr_regex, self.trade_name,re.IGNORECASE)

Those matches will be lower case, but your re.sub is searching on a string with mixed case.

Switch the code to:

tr_matches = re.search(tr_regex, tr_name, re.IGNORECASE)

Edit: If you want to upper case multiple substrings, re.search won't do, as it only matches one. findall should do the trick, something like:

tr_matches = re.findall(tr_regex, self.trade_name,re.IGNORECASE)
print(tr_matches)
for _, i in tr_matches:
    if i is not None:
        tr_name = re.sub(r'\b'+i+r'\b',i.upper(),tr_name)
        print(tr_name)

Edit 2: re.sub() is flexible enough that you can remove the whole matching step and loop. It can match each of your 1-4 digit words and upper case them with a lambda function:

  def format_trade_name(self):
        tr_name = self.trade_name.capitalize()
        tr_cap = [
            'oxy',
            'depo',
            'edex',
            'emla',
            'pred',
        ]
        tr_join = '|'.join(tr_cap)   
        tr_regex = r'\b(?!' +tr_join + r')(\w{1,4})\b'

        tr_name = re.sub(
            tr_regex,
            lambda match: match.group(0).upper(), 
            tr_name
        )
        return tr_name