Fix spelling errors in unique identifers of data

33 Views Asked by Katie At 27 July 2025 at 10:56

I have 6,000 items (just a sampling of some 200,000 entries). The unique identifier is a company name (not my choosing). There are spelling mistakes in the company name. I'm using Levenshtein's distance algorithm to decide if one company name is say 90% similar to the other company name. If this is true I would combine the entries. If I compare every company name entry against every other company name entry I have 6,000^2 iterations. This takes over ten minutes. The data entries are stored in a c++ std::map, where the company names are the key and the associated data is the value. Any ideas on how I can accurately decide whether two company names might be the same with small spelling errors or abbreviations, with out a nested for loop?

Original Q&A

Fix spelling errors in unique identifers of data

There are 0 best solutions below

Related Questions in BIGDATA

Trending Questions

Popular # Hahtags

Popular Questions