I have 12 Million company names in my db. I want to match them with a list offline. I want to know the best algorithm to do so. I have done that through Levenstiens distance but it is not giving the expected results. Could you please suggest some algorithms for the same.Problem is matching the companies like
G corp. ----this need to be mapped to G corporation
water Inc -----Water Incorporated
Use MatchKraft to fuzzy match company names on two lists.
http://www.matchkraft.com/
Levenstiens distance is not enough to solve this problem. You also need the following:
It is better to use an existing tool rather than creating your program in Python.