Merge columns from several dataframes with specific values with Pandas

44 Views Asked by At

I have 7 dataframes with only "OK" and "KO" values, and the only column that connects everything is the ID.

df1:
ID, Name, Address, Email
1, OK, OK, OK
2, OK, KO, OK
3, OK, OK, KO

df2:
ID Job, Credit_Card, Driving_License_Number
1, OK, OK, OK
2, KO, KO, OK
3, OK, OK, OK

I'm trying to find a way to query or to merge all the "KO" values into a single csv file / Dataframe so I can easily check what column failed the test

Something like this:

ID_2, ID_3
Address, Email
Job
Credit_Card

So, with this I know that ID_2 is missing the Address, Job and Credit Card information and ID_3 is missing the Email.

1

There are 1 best solutions below

0
On

Let's merged them first on ID, then do a matrix multiplication:

merged = df1.merge(df2, on='ID').set_index('ID')

(merged.eq('KO') @ (merged.columns + (', '))).str[:-2]

Output:

ID
1                             
2    Address, Job, Credit_Card
3                        Email
dtype: object