I am trying to apply the SCD2 with Python. The source files are .xlsx files and will be loaded into PostgreSQL.
I have applied the condition where I identify the changed rows in the merged dataframe by:
ChangedRecords = np.where((All_Data.is_current_x.notnull()) & (All_Data.is_current_y.notnull())
& ((All_Data['col1_x'] != All_Data['col1_y'])
| (All_Data['col2_x'] != All_Data['col2_y'])
| (All_Data['col3_x'] != All_Data['col3_y'])
... ),1,0)
The issue I am having is that the NULL values in source (.xlsx) are displayed as nan
and the NULL values in destination (PostgreSQL) are displayed as None
and therefore my code is considering these values as different and whenever a row has NULL value it gets in ChangedRecords which is wrong.
The datatype of the column is String.
I have used the below code to replace, but it seems like is not working.
Source = ftp_invoices.replace([np.nan],[None])
I still get nan
in NULL values
If think
nan
is a string and notNaN
(see the case).Try:
Example: