Increase max_components variable in dedupe library

165 Views Asked by At

How can I increase default value in max_components variable?

By default max_components is set to 30000. I need increase this limit because every time I do deduplications (using the same datasets) I have different results.

I think that the total amount of clusters in my data is bigger than 30000.

1

There are 1 best solutions below

0
On BEST ANSWER

Answer from Github

Issue in dedupe github Increase max_components = 30000

If you are getting different results using same saved settings file, then what you reporting is a bug. If you are getting different results from different training data (or even the same training data), that's expected as at various points dedupe uses a random sample to learn good rules.

In either case, I doubt that max_components is related. But, if you want to change it, fork the code and change it.