Split string based on delimiter into specific substrings in python stored in multiple columns

74 Views Asked by At

I would like to split this string A->B->C->D->E->F into substrings as A->B,B->C,C->D,D->E,E->F.

I tried using split and the delimiter as '->' but that doesn't give the output in the way I want. Would really appreciate some help here!

I have multiple such values in a column in a dataframe. I would like to create as many new columns as the max number of splits and then store each split in the respective column in the dataframe. Desired output

3

There are 3 best solutions below

0
mozway On

You can use str.split, itertools.pairwise, map and str.join:

from itertools import pairwise

s = 'A->B->C->D->E->F'
out = ','.join(map('->'.join, pairwise(s.split('->'))))

Output:

'A->B,B->C,C->D,D->E,E->F'

Similar logic if you have a Series/DataFrame:

from itertools import pairwise

df = pd.DataFrame({'Input': ['A->B->C->D',
                             'X->Y->Z',
                             'A->B->Z->D->Y',
                             'X->Y->A->E->F']})

out = df.join(pd.DataFrame([['->'.join(x) for x in pairwise(s.split('->'))]
                            for s in df['Input']])
                .rename(columns=lambda x: f'split {x+1}'))

           Input split 1 split 2 split 3 split 4
0     A->B->C->D    A->B    B->C    C->D     NaN
1        X->Y->Z    X->Y    Y->Z     NaN     NaN
2  A->B->Z->D->Y    A->B    B->Z    Z->D    D->Y
3  X->Y->A->E->F    X->Y    Y->A    A->E    E->F
0
Braian Pita On

You can do what you did to get all of the letters into a List in order, then just do something like a for loop that generates your desired substrings from each with index "idx" and "idx + 1"

my_string = "A->B->C->D->E->F"
items = my_string.split("->")
substrings = []

for idx in range(len(items) - 1):
  substrings.append(items[idx] + "->" + items[idx+1])
0
SIGHUP On

Unclear from OP but it may be that a list of substrings is required. In which case...

from itertools import pairwise

s = "A->B->C->D->E->F"

result = ["->".join(p) for p in pairwise(s.split("->"))]

print(result)

Output:

['A->B', 'B->C', 'C->D', 'D->E', 'E->F']