Making Dedupe learn from existing label data

1.1k Views Asked by At

I am aware that Dedupe uses Active learning to remove duplicates and perform Record linkage.

However , I would like to know if we can pass excel sheet with already matched pairs(label data) as the input for active learning?

1

There are 1 best solutions below

6
On

Not directly.

You'll need to get your data into a format that markPairs can consume.

Something like:

labeled_examples = {'match'    : [],
                    'distinct' : [({'name' : 'Georgie Porgie'},
                                   {'name' : 'Georgette Porgette'})]
                    }
deduper.markPairs(labeled_examples)

We do provide a convenience function for getting spreadsheet data into this format trainingDataDedupe.

(I am an author of dedupe)