Dedupe library in python - problem with log file

70 Views Asked by At

I got some issues with creating a log file using dedupe: this is the syntax I use to create the log file:

import datetime
import sys
global log_log_file

def writeErrorLogMessage(message):
    execution_log_line=str(datetime.datetime.now())+', - ERROR, '+message+". The process was stopped\n"
    log_log_file.write(execution_log_line)
    log_log_file.flush()
    log_log_file.close()
    sys.exit()
    
def writeInfoLogMessage(message):
    execution_log_line=str(datetime.datetime.now())+', - '+message+".\n"
    log_log_file.write(execution_log_line)
    log_log_file.flush()

when i start the clustering process with:

clustered_dupes = deduper.partition(data_d, threshold=th)

it creates a randomly new log file. for example: I start the code and it creates logfile1 when it comes to clustering it creates logfile2 logfile3 logfile4 logfile5 and logfile6 but those are not copies of the original log file, they contain just the log before the actual file importation (so they excludes a chunk of codes between the first checks and the clustering) when the clustering has concluded the file that continues to be updated is logfile1, not logfile6, which is the last one created. so somehow those files are opened and closed (i close the log file in the last line of the code, so is strange that newly created files skip a big chunk of code) I think that maybe the clustering uses the code and execute it again but not as main, so it skips all the

if __name__='__main__'

codes. I looked for the code under deduper.clustering but when I used print(dedupe.file) the directory of dedupe does not contain either deduper or partition. so it probably recalls another file module that I don't know how is named how can I avoid this? thanks a lot

0

There are 0 best solutions below