dedupe OverflowError on record linkage

203 Views Asked by At

I want to use Dedupe library for record linkage. I wrote this code from Dedupe examples on Github. But when i run my code i get this error :

OverflowError: Python int too large to convert to C ssize_t ##

its because my data are very big.how i cant filter my data_d columns?? it should help. I searched all stackoverflow questions but i couldn't find right answer.

def readData(filename):
    """
    Read in our data from a CSV file and create a dictionary of records,
    where the key is a unique record ID.
    """

    data_d = {}

    with codecs.open(filename,encoding='utf-8') as f:

       reader = csv.DictReader(f)
       for i, row in enumerate(reader):
            clean_row = dict([(k, preProcess(v)) for (k, v) in row.items()])
            data_d[filename + str(i)] = dict(clean_row)

    return data_d
0

There are 0 best solutions below