Split by "pipeline" with a negative lookahead/lookbehind

40 Views Asked by At

I have this code:

my_string = "(chargeur|magn(e|é)tique) sans fil|chargeur solaire (imperm(e|é)able|pliable)|chargeur ext(e|é)rieur externe|chargeur de t(e|é)l(e|é)phone solaire"
my_list = re.split(r'\|\s*(?![^()]*\))', my_string)
print(my_list)

I am trying to split by pipeline but also take in consideration the parenthesis. For exemple my list should be something like this: my_list = ['(chargeur|magn(e|é)tique) sans fil', 'chargeur solaire (imperm(e|é)able|pliable)' etc..]

But instead I get this: my_list = ['(chargeur', 'magn(e|é)tique) sans fil', 'chargeur solaire (imperm(e|é)able|pliable)' etc..]

I know there is a lookahead and lookbehind negative approach, but I don't really understand how should I merge them in order to consider all the parenthesis. Thank you!

1

There are 1 best solutions below

0
alinapal On

I think I have found the solution:

def split_string(string):
    result = []
    current = []
    level = 0
    for char in string:
        if char == "(":
            level += 1
        elif char == ")":
            level -= 1
        if char == "|" and level == 0:
            result.append("".join(current))
            current = []
            continue
         current.append(char)
    result.append("".join(current))
    return result
my_string = "(chargeur|magn(e|é)tique) sans fil|chargeur solaire (imperm(e|é)able|pliable)|chargeur ext(e|é)rieur externe|chargeur de t(e|é)l(e|é)phone solaire"
my_list = split_string(my_string)
print(my_list)

The code uses a counter level to keep track of the level of parentheses. Whenever a left parenthesis is encountered, the level is increased, and whenever a right parenthesis is encountered, the level is decreased. The pipeline is split only when the level is 0, meaning that the pipeline is not between any parentheses.