How can I combine elements in a list at specific points using recursion?

93 Views Asked by At

I have seen various questions on here about combining linked lists with recursion, but none of them seem to quite fit to my problem.

I have a list with elements that combine to make a single string. The elements of the string have 'attachment point' identifiers, of the format "[*:x]" which denote which element to attach where. For example, ["[*:2]DEF","ABC[*:1]GH"] will combine to ABCDEFGH. The process to combine two elements is replacing the attachment points with a character, for example "Z" and then calling a function which combines the two strings at the correct point, using the "Z" character as a locator (so I can't just regex it)

Easy enough when you have 2 elements, but when you have multiple elements that is much more difficult. If I want to combine the elements of the list lib = ["[*:4]F[*:2]","[*:1]GH[*:5]","AB[*:4]","[*:3]C[*:6]E[*:1]","[*:2]IJ","[*:4]D"] then I need a different approach. My predecessor wrote a function to recursively combine the elements as pure strings, which I have been using but now I need to add in the function to combine the elements as pairs.

The old recursive function is here:

lib = ["[*:4]F[*:2]","[*:1]GH[*:5]","AB[*:4]","[*:3]C[*:6]E[*:1]","[*:2]IJ","[*:4]D"]

def combine(r, element, lib):

    if "[*:" in element:
        root = f"[*:{r}]"
        pos = element.index("[*:")
        pos1 = element.index("]", pos)
        next_r = int(element[pos + 3:pos1])
        con = conf[next_r - 1]
        if root in con:
            con = con.replace(root, "")
        return element[0:pos] + combine(next_r, con, lib) + combine(r, element[pos1 + 1:], lib)

    return element

result = combine(1,lib[0],lib)

My semi-functional attempt to modify the above function to be able to use the pairwise combination function pairstitch is here:

import re

lib = ["[*:4]F[*:2]","[*:1]GH[*:5]","AB[*:4]","[*:3]C[*:6]E[*:1]","[*:2]IJ","[*:4]D"]

#Example to make MWE work - actual function is much more complicated
def pairstitch(frag1, frag2, identifier):
     return frag1.replace(identifier, frag2.replace(identifier,""))

def combine(r, smiles, lib):

    if "[*:" in element:
        root = f"[*:{r}]"
        pos = element.index("[*:")
        pos1 = element.index("]", pos)
        next_r = int(element[pos + 3:pos1])
        con = lib[next_r - 1]
        attach_lst_base = re.findall("\[\*\:\d\]", element)
        attach_lst_con = re.findall("\[\*\:\d\]", con)
        if len(attach_lst_con) == 1:
            if root in con:
                next_r_block = f"[*:{next_r}]"
                frag1 = element.replace(next_r_block, "Z")
                frag2 = con.replace(root, "Z")
            else:            
                pos = con.index("[*:")
                pos1 = con.index("]", pos)
                next_r = int(con[pos + 3:pos1])
                next_r_block = con[pos:pos1 + 1]
                frag1 = smiles.replace(root, "Z")
                frag2 = con.replace(next_r_block, "Z")
            element = pairstitch(frag1,frag2,"Z")
        else:
            next_frag = con
            for att_pt in attach_lst_base:
                if len(attach_lst_base) > len(attach_lst_con):
                    next_frag = combine(r, next_frag, conf)
                else:
                    next_frag = combine(next_r, next_frag, lib)

            next_r_block = f"[*:{next_r}]"
            frag1 = smiles.replace(next_r_block, "Z")
            frag2 = next_frag.replace(root, "Z")
            element = pairstitch(frag1, frag2, "Z")

    return element

result = combine(1,lib[0],lib)

However this only goes down one branch of the tree, and gives me ABCDEF[*:2] when what I would expect is ABCDEFGHIJ.

For context, the strings being combined represent chemical fragments which might contain numbers, certain special characters like ":,/,,#,=,@", or sections like "[nH2]", hence the format of the attachment points. As such, splitting the elements up into smaller parts will not work because the function which combines pairs of fragments together does so by various chemical rules to make sure a valid molecule comes out the other end.

2

There are 2 best solutions below

0
On BEST ANSWER

Came back to this after some time working on other things, and I figured it out, so I'm posting the solution here in case it helps anyone else.

The solution lay in adding a parameter to the function which gave the location of the parent to that fragment. This means that if the list to be combined was

lib = ["[*:4]F[*:2]","[*:1]GH[*:5]","AB[*:4]","[*:3]C[*:6]E[*:1]","[*:2]IJ","[*:4]D"]

then starting with element 0 you would be able to examine element 3 next, while knowing that you got there from element 0.

The code itself is here:

import re

lib = ["[*:4]F[*:2]","[*:1]GH[*:5]","A[*:7][*:4]","[*:3]C[*:6]E[*:1]","[*:2]IJ","[*:4]D", "[*:3]B"]

#Example to make MWE work - actual function is much more complicated
def pairstitch(frag1, frag2, identifier):
     return frag1.replace(identifier, frag2.replace(identifier,""))

def combine(r0, r, element, lib):
    if "[*:" in element:
        root = f"[*:{r}]"
        parent = f"[*:{r0}]"
        attach_lst_base = re.findall("\[\*\:\d\]", element)
        for att_pt in attach_lst_base:
            if att_pt != parent:
                next_r = int(att_pt[3:-1])
                con = lib[next_r - 1]
                attach_lst_con = re.findall("\[\*\:\d\]", con)
                if len(attach_lst_con) != 1:
                    next_frag = combine(r, next_r, con, lib)
                else:
                    next_frag = con
                if root in next_frag:
                    next_r_block = f"[*:{next_r}]"
                    frag1 = element.replace(next_r_block, "Z")
                    frag2 = next_frag.replace(root, "Z")
                element = pairstitch(frag1, frag2, "Z")
        return element

    return element

test = combine(0,1,lib[0],lib)
print(test)

This outputs "ABCDEFGHIJ" as expected, and so gives an example of how a list of elements with defined attachment points can be combined when under the restriction that the elements have to be combined in a chemically aware manner.

1
On

Let's start by converting that format to something easier to work with:

def parse(s):
    return [
        int(r) - 1 if r else c
        for r, c in re.findall(r'\[\*:(\d+)\]|([^\[]+)', s)
    ]

parsed = [parse(s) for s in lib]

which for your example generates

[[3, 'F', 1], [0, 'GH', 4], ['AB', 3], [2, 'C', 5, 'E', 0], [1, 'IJ'], [3, 'D']]

Now, to combine everything in one string, start with arbitrary element (e.g. 0) and repeatedly replace numbers with their respective sublists, keeping track of what's been replaced already:

def expand(parsed, start=0):
    exp = parsed[start]
    seen = set([start])
    done = False

    while not done:
        done = True
        new = []

        for x in exp:
            if isinstance(x, str):
                new.append(x)
                continue
            done = False
            if x not in seen:
                seen.add(x)
                new.extend(parsed[x])

        exp = new

    return ''.join(exp)

which returns ABCDEFGHIJ, as expected.

Not sure what you mean by "combine pairwise"... for example, what would be the result of combining "[*:4]F[*:2]" with "[*:1]GH[*:5]"?