Regex match items in list + trailing N numbers (Python)

66 Views Asked by At

I have a list of expected animals:

expectedAnimals = ['cat-', 'snake-', 'hedgehog-']

Then I have a user input (in string format) that contains some or all of the expected animals from the above list follwed by N numbers. These animals are separated by random delimiting symbols (non-integer):

Examples:

inputString1 = 'cat-235##randomtext-123...snake-1,dog-2:snake-22~!cat-8844'
inputString2 = 'hedgehog-2>cat-1|snake-22#cat-2<$dog-55 snake-93242522. cat-3 .rat-2 snake-22 cat-8844'

My goal (with which I am struggling) is to write the function filterAnimals that should return the following correct results:

approvedAnimals1 = filterAnimals(inputString1)

['cat-235', 'snake-1', 'snake-22', 'cat-8844']

approvedAnimals2 = filterAnimals(inputString2):

['hedgehog-2', 'cat-1', 'snake-22', 'cat-2', 'snake-93242522', 'cat-3', 'snake-22', 'cat-8844']

My current implementation works partially but honestly I would like to re-write it from scratch:

def filterAnimals(inputString):
    expectedAnimals = ['cat-', 'snake-', 'hedgehog-']
    start_indexes = []
    end_indexes = []
    for animal in expectedAnimals:
        temp_start_indexes = [i for i in range(len(inputString)) if inputString.startswith(animal, i)]
        if len(temp_start_indexes) > 0:
            start_indexes.append(temp_start_indexes)
            for start_ind in temp_start_indexes:
                for i in range(start_ind + len(animal), len(inputString)):
                    if inputString[i].isdigit() and i == len(inputString) - 1:
                        end_indexes.append(i + 1)
                        break
                    if not inputString[i].isdigit():
                        end_indexes.append(i)
                        break
        start_indexes_flat = [item for sublist in start_indexes for item in sublist]
        list_size = min(len(start_indexes_flat), len(end_indexes))
        approvedAnimals = []
        if list_size > 0:
            for x in range(list_size):
                approvedAnimals.append(inputString[start_indexes_flat[x]:end_indexes[x]])
    return approvedAnimals
2

There are 2 best solutions below

0
On

import re

# matches expected animals followed by N numbers
pattern=re.compile("(cat|snake|hedgehog)-\d+")

inputString1 = 'cat-235##randomtext-123...snake-1,dog-2:snake-22~!cat-8844'
inputString2 = 'hedgehog-2>cat-1|snake-22#cat-2<$dog-55 snake-93242522. cat-3 .rat-2 snake-22 cat-8844'

animals_1 = [i.group() for i in pattern.finditer(inputString1)]
# will return ['cat-235', 'snake-1', 'snake-22', 'cat-8844']

animals_2 = [i.group() for i in pattern.finditer(inputString2)]
# will return ['hedgehog-2', 'cat-1', 'snake-22', 'cat-2', 'snake-93242522', 'cat-3', 'snake-22', 'cat-8844']
0
On

You can build an alternation pattern from expectedAnimals and use re.findall to find all matches as a list:

import re

def filterAnimals(inputString):
    return re.findall(rf"(?:{'|'.join(expectedAnimals)})\d+", inputString)

Demo: https://replit.com/@blhsing/OffensiveEveryWebportal