How do I compare the similarity of person names using a metric?

1.5k Views Asked by At

I am particularly working on a function to allow the misspelled and aliases of person names. I have done some research & found there are quite a number of algorithms for String metric and phonetic libraries too.

I have tried some and of all those Jaro Winkler gives some good results as below.

compareStrings("elon musk","elon musk"))    --> 1.0 
compareStrings("elonmusk","elon musk"))     --> 0.98
compareStrings("elon mush","elon musk"))    --> 0.99
compareStrings("eln msuk","elon musk"))     --> 0.94
compareStrings("elon","elon musk"))         --> 0.89
compareStrings("musk","elon musk"))         --> 0.0  //This is bad, but can fix that.
compareStrings("mr elon musk","elon musk")) --> 0.81

The above is the implementation from Apache commons Library.I wanted to know if there is any better implementation which serves the purpose better. Any help is appreciated.

Edit: @newuserua_ext @Trasher Thanks, I appreciate for your time. I have gone through all StackExchange Q&A related to this. And posted this question focusing on person names.

2

There are 2 best solutions below

0
Dan Armstrong On

Consider Double Metaphone. We use it successfully to find "sounds-like" matches to names. You can find an implementation for Java in Apache Commons:

https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/language/DoubleMetaphone.html

0
Codor On

One possibility is the Levenshtein distance, which measures the edit distance of the strings given specific permitted operations. It can be more or less efficiently evaluated using dynamic programming, but is not really suitable for determining phonetic similarity.