Python nan and None

504 Views Asked by At

I am trying to apply the SCD2 with Python. The source files are .xlsx files and will be loaded into PostgreSQL.

I have applied the condition where I identify the changed rows in the merged dataframe by:

ChangedRecords = np.where((All_Data.is_current_x.notnull()) & (All_Data.is_current_y.notnull())
                              & ((All_Data['col1_x'] != All_Data['col1_y'])
                                 | (All_Data['col2_x'] != All_Data['col2_y'])
                                 | (All_Data['col3_x'] != All_Data['col3_y'])
                                 ... ),1,0)

The issue I am having is that the NULL values in source (.xlsx) are displayed as nan and the NULL values in destination (PostgreSQL) are displayed as None and therefore my code is considering these values as different and whenever a row has NULL value it gets in ChangedRecords which is wrong.

The datatype of the column is String.

I have used the below code to replace, but it seems like is not working.

Source = ftp_invoices.replace([np.nan],[None])

I still get nan in NULL values

1

There are 1 best solutions below

0
On

If think nan is a string and not NaN (see the case).

Try:

Source = ftp_invoices.replace('nan', None)

Example:

df = pd.DataFrame({'A': [None, np.NaN, 'nan']})
print(df)

# Output
      A
0  None  # <- None
1   NaN  # <- np.NaN
2   nan  # <- the string nan
# Replace real NaN by None
>>> df.replace(np.NaN, None)
      A
0  None
1  None  # Only here
2   nan

# Replace nan strings by None
>>> df.replace('nan', None)
      A
0  None
1   NaN
2  None  # Only here

# Combine all
>>> df.replace({'nan': None, np.NaN: None})
      A
0  None
1  None
2  None