I want to reconcile by reading data from two different databases. I get an out of memory error while reading the data for a job that will run on a single machine.
I read the data with pandas by taking two different SQL as source and target. I can't try chunksize because I need to pull the same data to reconcile from two different systems.
My main problem is to be able to pull the same data from two different systems in chunks and compare them in a loop.
available code:
smartdq_logging('Log', gv_log_level, f"connecting to source db...")
cl_connect_sourcedb = lib.environment_connection.DBConnect()
cl_connect_sourcedb.connect_with_environment_id(environmentIdSource, "SMARTDQ", cl_connect_smartdq)
smartdq_logging('Log', gv_log_level, f"connecting to target db...")
cl_connect_target = lib.environment_connection.DBConnect()
cl_connect_target.connect_with_environment_id(environmentIdTarget, "SMARTDQ", cl_connect_smartdq)
source_query = """SELECT
LEVEL AS id,
ROUND(DBMS_RANDOM.VALUE(0, 99999) / 100000, 5) AS value
FROM
DUAL
CONNECT BY
LEVEL <= 1000000"""
target_query = """SELECT
LEVEL AS id,
ROUND(DBMS_RANDOM.VALUE(0, 99999) / 100000, 5) AS value
FROM
DUAL
CONNECT BY
LEVEL <= 1000000"""
print('start recon...')
df_source = db_func.read_data_to_dataframe(source_query, cl_connect_sourcedb, "original")
df_target = db_func.read_data_to_dataframe(source_query, cl_connect_target, "original")