Use python Dedupe package to check for single record

736 Views Asked by Eswar At 28 July 2025 at 15:29

I am using Dedupe python package to check for duplicates for my incoming records. I have trained approx. 500000 records from a CSV file. Using the Dedupe package, I have clustered the 500000 records into different clusters. I have attempted to use the settings_file got out of training to do dedupe for the new record(data in the code). I have shared a code snippet below.

import dedupe
from unidecode import unidecode
import os

deduper=None
if os.path.exists(settings_file):
    with open(settings_file, 'rb') as sf :
        deduper = dedupe.StaticDedupe(sf)

clustered_dupes = deduper.match(data, 0)

data, here is a single new record that I have to check if it has a duplicate or not. data looks like

{1:{'SequenceID': 6855406, 'ApplicationID': 7065902, 'CustomerID': 6153222, 'Name': 'X', 'col1': '-42332423', 'col2': '0', 'col3': '0', 'col4': '0', 'col5': '24G0859681', 'col6': '0', 'col7': 'xyz12345', 'col8': 'xyz', 'col9': '1234', 'col10': 'xyz10'}}

This throws an error.

No records have been blocked together. Is the data you are trying to match like the data you trained on?

How do I use this clustered data to check if new record is a duplicate or not? Is it possible to do like we would do with any ML model? I have looked into multiple sources but haven't found the solution to this problem.

Most of the sources talk about training and not about how I use the clustered data to check for a single record.

Is there another way out.

Some links that I have referred: link1 link2 link3

Any help is appreciated.

Original Q&A

There are 1 best solutions below

kb hithesh On 19 July 2021 at 12:47

You would need to pass the initially trained data along with the new record as input to cluster based on pre trained settings

Use python Dedupe package to check for single record

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in MACHINE-LEARNING

Related Questions in TRAINING-DATA

Related Questions in PYTHON-DEDUPE

Trending Questions

Popular # Hahtags

Popular Questions