Dataframe merge and create multiple columns with multiple matches

70 Views Asked by At

I have two pandas dataframes. DF1 has columns A B and C. DF2 has columns B and C. I want to create DF3 with columns A B C D and E.

I want column A B and C to be the same and in the same order as column A B and C in DF1. Then I want column D to be the values in DF2 column B when DF2 B and C both match DF1 B and C. Then I want DF3 column E to be the values in DF2 column B when DF2 column B matches DF1 B but the column C amounts do not match.

To summarize:

Columns:

A - same as DF1 A

B - same as DF1 B

C - same as DF1 C

D - the values of DF2 B when both B and C match DF1.

E - the values of DF2B when only B matches DF1 and c does not.

Thanks in advance!

Here is an example:

DF1

DF2

DF3

Here is the same example in text:

DF1:
A B C
AAA 123 456
BBB 789 888
CCC 999 111
DDD 555 444
DF2:
B C
123 456
789 888
789 900
789 950
999 111
555 000
555 444
DF3:
A B C D E
AAA 123 456 456 nan
BBB 789 888 888 900, 950
CCC 999 111 111 nan
DDD 555 444 444 000

I have tried to do a merge like the following: df1.merge(df2['C'].astype(str).groupby(df2['B']).agg(', '.join), on='B', how='left;).fillna('0') which has worked when I want to match on one column (for example B), and pull in the information in C. But I am not familiar with how to pull in multiple columns when one needs to match twice, and one needs to match only once, discluding any that match in both columns.

1

There are 1 best solutions below

0
mozway On

You can merge on "B", then assign "D"/"E" depending on the equality of the two "C" columns. Finally, pivot_table to reshape your DataFrame:

out = (df1
  .merge(df2, on='B', how='left', suffixes=(None, '_'))
  .assign(col=lambda d: np.where(d['C'].eq(d['C_']), 'D', 'E'))
  .pivot_table(index=['A', 'B', 'C'], columns='col', values='C_',
               aggfunc=lambda x: ','.join(x.dropna().astype(str)))
  .reset_index().rename_axis(columns=None)
)

Output:

     A    B    C    D        E
0  AAA  123  456  456      NaN
1  BBB  789  888  888  900,950
2  CCC  999  111  111      NaN
3  DDD  555  444  444      000