I've written the code below. It works, but I'm sure I can do clearer and faster.
The idea is:
- I have 2 input DataFrames and I want 1 DataFrame as output.
- DF1 is like Name, Attribute1, Attribute2, Attribute3, ...
- DF2 is like Name1, Name2, Value1, Value2
I want, for each line of DF2 that NameX is replaced by the list of the Attribute(s) in DF1.
import pandas as pd
# dictionary 1
dico_1 = {
'Name': ['A', 'B', 'C'],
'Attr1': ['XXX', 'YYY', 'XXX'],
'Attr2': ['YYY', 'ZZZ', 'YYY'],
}
dico_2 = {
'Pair_1': ['A', 'B', 'B', 'A'],
'Pair_2': ['B', 'C', 'A', 'C'],
'V1': ['V1_AB', 'V1_BC', 'V1_BA', 'V1_AC'],
'V2': ['V2_AB', 'V2_BC', 'V2_BA', 'V2_AC']
}
df1 = pd.DataFrame(dico_1)
df2 = pd.DataFrame(dico_2)
def cons(df1, df2, row):
P1 = df2['Pair_1'][row]
P2 = df2['Pair_2'][row]
tmp1 = df1.loc[df1['Name'] == P1, "Attr1":"Attr2"]
tmp2 = df1.loc[df1['Name'] == P2, "Attr1":"Attr2"]
tmp3 = pd.DataFrame(df2.loc[row, "V1":"V2"]).transpose()
tmp1.reset_index(drop=True, inplace=True)
tmp2.reset_index(drop=True, inplace=True)
tmp3.reset_index(drop=True, inplace=True)
tmp1 = tmp1.add_suffix('_Pair1')
tmp2 = tmp2.add_suffix('_Pair2')
a = pd.concat([tmp1, tmp2, tmp3], axis=1)
return a
df3 = pd.DataFrame(index=range(df2.shape[0]),
columns=['Attr1_Pair1', 'Attr2_Pair1', 'Attr1_Pair2', 'Attr2_Pair2', 'V1', 'V2'])
for row in range(df2.shape[0]):
line = cons(df1, df2, row)
df3.loc[row] = line.iloc[0]
df3
Let's try two merges instead:
df3: