What are some algorithms tailored specifically towards *human* name-matching?

218 Views Asked by At

I wanted to know if there are any algorithms or libraries that specifically address the many issues of human name-matching. I ran across ones that fuzzy string-matching algorithms are built on, and I found that they tend to yield a lot of false-positives (even though they're pretty decent at catching true-positives). For example, it will be very positive that 'Ken Williams' and 'Benjamin Williams' are the same person, when intuitively it's not very likely that a person named 'Ken' also goes by 'Benjamin'. Conversely, there may also be some true-negatives, albeit rarer. For example, it's not very confident that 'Cel Gonzalez' and 'Cel Gonzalez del Toro' are the same people, when intuitively, they very likely could be.

Some other challenges that basic fuzzy algorithms simply don't address:

  • Shortened names (e.g.- 'William' vs. 'Bill', or 'Elizabeth' vs. 'Betsy', etc.).
  • Initialized middle names (e.g.- 'Marc Ditkovich Johnson' and 'Marc D. Johnson').
  • Omitted full middle names, or middle initials, or extended last names (e.g.- 'John Carter' and 'John A. Carter', or 'Javier Valenzia' and 'Javier Valenzia-Escarcega').

Unfortunately, I haven't been able to find any libraries or algorithms that tackle this specific issue, which I thought would show up a lot more in Google searches. Any aid will be appreciated!

0

There are 0 best solutions below