How to merge two columns by the intersection of the elements in each col?

55 Views Asked by At

Imagine I have a dataframe like this: With lists of elements in a single string.

data = {'Col1': ["apple, banana, orange", "dog, cat", "python, java, c++"],
        'Col2': ["banana, lemon, blueberry", "bird, cat", "R, fortran"]
       }
df = pd.DataFrame(data)
df

How can I create a Col3 with the intersection of elements in Col1 and Col2

Expected output:

data = {'Col1': ["apple, banana, orange", "dog, cat", "python, java, c++"],
        'Col2': ["banana, lemon, blueberry", "bird, cat", "R, fortran"],
        'Col3': ["banana", "cat", NA]
       }
df = pd.DataFrame(data)
df
2

There are 2 best solutions below

2
mozway On BEST ANSWER

Using a list comprehension and set intersection:

df['Col3'] = [', '.join(set(a.split(', ')) & set(b.split(', ')))
              for a,b in zip(df['Col1'], df['Col2'])]

Output:

                    Col1                      Col2    Col3
0  apple, banana, orange  banana, lemon, blueberry  banana
1               dog, cat                 bird, cat     cat
2      python, java, c++                R, fortran        

If you want NAs on empty intersections:

df['Col3'] = [x if (x:=', '.join(set(a.split(', ')) & set(b.split(', '))))
              else pd.NA
              for a,b in zip(df['Col1'], df['Col2'])]

Output:

                    Col1                      Col2    Col3
0  apple, banana, orange  banana, lemon, blueberry  banana
1               dog, cat                 bird, cat     cat
2      python, java, c++                R, fortran    <NA>
0
Andrej Kesely On

One possible solution would be to convert Col1 and Col2 to sets and do an intersection:

s1 = df["Col1"].apply(lambda x: set(map(str.strip, x.split(","))))
s2 = df["Col2"].apply(lambda x: set(map(str.strip, x.split(","))))

df["Col3"] = pd.concat([s1, s2], axis=1).apply(
    lambda x: ", ".join(set.intersection(x.Col1, x.Col2)) or None, axis=1
)
print(df)

Prints:

                    Col1                      Col2    Col3
0  apple, banana, orange  banana, lemon, blueberry  banana
1               dog, cat                 bird, cat     cat
2      python, java, c++                R, fortran    None