Are there any tools or methods that can be used for matching by a person's name between two different data sources?
The systems have no other common information and the names have been entered differently in many cases.
Examples of non-exact matches:
King Jr., Martin Luther = King, Martin (exclude suffix)
Erving, Dr. J. = Erving, J. (exclude prefix)
Obama, Barak Hussein = Obama, Barak (exclude middle name)
Pufnstuf, H.R. = Pufnstuf, Haibane Renmei (match abbreviations)
Tankengine, Thomas = Tankengine, Tom (match common nicknames)
Flair, Rick "the Natureboy" = Flair, Natureboy (match on nickname)
I had to use a variety of techniques suggested. Thanks pointing me in the right direction(s). Hopefully, the following will help someone else out with this type of problem to solve.
Removing excess characters
Usage:
Split name into parts
Usage:
Common nicknames:
I created a table based on this list and used it to join on common name equivalents.
Usage:
Phonetic algorithms (Jaro Winkler):
The amazing article, Beyond SoundEx - Functions for Fuzzy Searching in MS SQL Server, shows how to install and use the SimMetrics library into SQL Server. This library lets you find relative similarity between strings and includes numerous algorithms. I ended up mostly using Jaro Winkler to match the names.
Usage: