Perform Grouped-by Cartesian Cross-join of Two Dataframes

45 Views Asked by At

I want to build the cartesian product of two dataframes after grouping.

import pandas as pd

df1 = pd.DataFrame({"group": [1, 1, 1, 1, 2], "A": [3, 4, 3, 4, 3], "B": [5, 5, 6, 6, 7]})
df2 = pd.DataFrame({"group": [1, 1, 2, 2], "A": [3, 4, 5, 6], "B": [6, 6, 6, 7]})

target = pd.DataFrame({"A_x": [3] * 4 + [4] * 4 + [5, 6],
                       "B_x": [6] * 8 + [6, 7],
                       "A_y": [3, 4, 3, 4] * 2 + [3, 3],
                       "B_y": [5, 5, 6, 6] * 2 + [7, 7],
                       "group": [1] * 8 + [2, 2],
                       })

>>> df1
   group  A  B
0      1  3  5
1      1  4  5
2      1  3  6
3      1  4  6
4      2  3  7


>>> df2
   group  A  B
0      1  3  6
1      1  4  6
2      2  5  6
3      2  6  7

>>> target
   A_x  B_x  A_y  B_y  group
0    3    6    3    5      1
1    3    6    4    5      1
2    3    6    3    6      1
3    3    6    4    6      1
4    4    6    3    5      1
5    4    6    4    5      1
6    4    6    3    6      1
7    4    6    4    6      1
8    5    6    3    7      2
9    6    7    3    7      2

Currently I'm doing this in a loop, but I'm sure there must be a more pandas-like solution that's more performant on bigger datasets

df = pd.DataFrame()
for group in df1.group.unique():
    df_merged = df2.loc[df2.group == group].merge(df1.loc[df1.group == group], "cross")
    df = pd.concat([df, df_merged], axis=0)

df["group"] = df.group_x
df.drop(["group_x", "group_y"], axis=1, inplace=True)
df.reset_index(drop=True, inplace=True)

>>> df.equals(target)
True
0

There are 0 best solutions below