How can I derive a list of records that are not common across two dataframes?

79 Views Asked by At

I have two dataframes relating to securities - same structures / datatypes, just different sizes.

df1:

     security_ID     market_cap
0    ajax123         100000
1    apple456        10000
2    amazon513       20000
3    firefly312      200000


df2:
    
         security_ID     market_cap
    0    ajax123         100000
    1    apple456        10000
    2    amazon513       20000
    3    google566       200000

I want to do a vlookup style check, to identify the security IDs that are in df1 but not in df2, and vice-versa. I would then like to drop these security IDs so that I have two equalised dataframes for further analysis.

I have tried to use the following approach to get this, but to no avail:

df1['sec_id_check'] = df1['security_ID'].isin(df2['security_ID'])

This should have ideally populated the df1['sec_id_check'] with 'True' and 'False', but all I get is 'True' across all 12,498 entries. I repeated exactly the same approach in reverse for df2, by creating the df['sec_id_check'] column, and again, I got only 'True' across all 12,510 records

I know for a fact that there are securities that don't exist across both datasets - firefly123 in df1 doesn't exist in df2, and google566 is in df2 but not in df1 - I would have expected these to have been flagged as 'False' in my test.

Look forward to your responses - thanks very much in advance!

2

There are 2 best solutions below

7
On BEST ANSWER

Your code work for

m = df1['security_ID'].isin(df2['security_ID'])
print(df1[m])
7
On

Let's use pd.DataFrame.compare new in version 1.1.0.

df1.compare(df2)

Output:

 security_ID           
         self      other
3  firefly312  google566