How to Join two data frames using pandasql with condition?

255 Views Asked by At

I have two data frames as below,

    listCode= ['A1','A2','A3','A4','A5','A6']
    dfLookup = pd.DataFrame({'ID':listCode})

    data = [['Chicago', 'B1'], ['Madsion', 'A1'], ['NY', 'A4']]
    dftest = pd.DataFrame(data, columns=['City', 'Code', ])
    dftest['Found'] =''
    dftest

    City    Code    Found
    Chicago B1  
    Madsion A1  
    NY      A4  

Expected Result:

    City    Code    Found
    Chicago B1       0  
    Madsion A1       1
    NY      A4       1

I am trying to join the data frames using the below, but not sure how to use the case statements.

    import pandasql as pdsql
    import sqldf
    sQuery = """
       SELECT   dftest.City, dftest.Code, dftest.Found 
       FROM  dftest 
       LEFT JOIN dfLookup 
       ON dftest.Code= dfLookup.ID"""
       sqlResult = sqldf.run(sQuery )

Thanks for your help.

2

There are 2 best solutions below

0
PaulS On BEST ANSWER

Another possible solution (there is no need to use pandasql):

(dftest.merge(dfLookup, how='left', left_on='Code', right_on='ID')
 .assign(Found = lambda x: x['ID'].notnull().astype(int))
 .drop('ID', axis=1))

In case you really want to use pandasql:

from pandasql import sqldf

pysqldf = lambda q: sqldf(q, globals())

q = '''
SELECT dftest.City, dftest.Code, 
       CASE WHEN dfLookup.ID IS NOT NULL THEN 1 ELSE 0 END AS Found
FROM dftest
LEFT JOIN dfLookup ON dftest.Code = dfLookup.ID
'''

pysqldf(q)

Output:

      City Code  Found
0  Chicago   B1      0
1  Madsion   A1      1
2       NY   A4      1
0
coco18 On

if you like, you can use .isin(list) function. It will give true or false in the column "Found" The code should look like this:

dftest['Found'] = dftest['Code'].isin(listCode)

the result is:

        City    Code    Found
0       Chicago B1      False
1       Madsion A1      True
2       NY      A4      True