Cluster New Record in Dedupe Clustered Table

221 Views Asked by At

I am using Python Dedupe for de-duplication for our MDM database, So far it works fine after sufficient training and a entity map table is formed which shows you the Cluster_id's, Canonical name and a score.

I'm stucked and not sure for a new record inserted in the database, how this new record can be merged with the existing clusters in the entity_map table. I could not find a function in the dedupe documentation also.

Running the entire process(creating blocking map,plural key and clustered dupes) again for the new records will be costly, so just looking for a less expensive solution to cluster the new records with the existing clusters in entity map table

0

There are 0 best solutions below