I want to build the cartesian product of two dataframes after grouping.
import pandas as pd
df1 = pd.DataFrame({"group": [1, 1, 1, 1, 2], "A": [3, 4, 3, 4, 3], "B": [5, 5, 6, 6, 7]})
df2 = pd.DataFrame({"group": [1, 1, 2, 2], "A": [3, 4, 5, 6], "B": [6, 6, 6, 7]})
target = pd.DataFrame({"A_x": [3] * 4 + [4] * 4 + [5, 6],
"B_x": [6] * 8 + [6, 7],
"A_y": [3, 4, 3, 4] * 2 + [3, 3],
"B_y": [5, 5, 6, 6] * 2 + [7, 7],
"group": [1] * 8 + [2, 2],
})
>>> df1
group A B
0 1 3 5
1 1 4 5
2 1 3 6
3 1 4 6
4 2 3 7
>>> df2
group A B
0 1 3 6
1 1 4 6
2 2 5 6
3 2 6 7
>>> target
A_x B_x A_y B_y group
0 3 6 3 5 1
1 3 6 4 5 1
2 3 6 3 6 1
3 3 6 4 6 1
4 4 6 3 5 1
5 4 6 4 5 1
6 4 6 3 6 1
7 4 6 4 6 1
8 5 6 3 7 2
9 6 7 3 7 2
Currently I'm doing this in a loop, but I'm sure there must be a more pandas-like solution that's more performant on bigger datasets
df = pd.DataFrame()
for group in df1.group.unique():
df_merged = df2.loc[df2.group == group].merge(df1.loc[df1.group == group], "cross")
df = pd.concat([df, df_merged], axis=0)
df["group"] = df.group_x
df.drop(["group_x", "group_y"], axis=1, inplace=True)
df.reset_index(drop=True, inplace=True)
>>> df.equals(target)
True