please, I am working on a PoC for Person Real-time Identification, and one of the critical aspects of it is to support both minor misspelling and phonetic variations of First, Middle, and Last name. Like HarinGton == HarrinBton or RaphEAl == RafAEl. It's working for longer names, but it's a bit more imprecise for names like Lee and John.
I am using Double Metaphone through dmetaphone() and dmetaphone_alt() in PostgreSQL 13.3 (Supabase.io). And although I appreciate Double Metaphone it has a (too?) short string as the outcome. metaphone() has parameters to make the resulting phonetic representation longer. I investigated dmetaphone() and couldn't find anything other than the default function.
Is there a way of making dmetaphone() and dmetaphone_alt() return a longer phonetic representation similar to metaphone()'s, but with a ALT variation?.
Any help would be much appreciated.
Thanks
Looking at the postgres docs for these features you don't have parametric control over the length of the encoded string for Double Metaphone. In the case of single Metaphone, you can only truncate the output string:
However you may get much better results by using
Trigram SimilarityorLevenshtein Distanceon the encoded output from either of the metaphone methods - this can be a more powerful way to handle phonetic permutations using Metaphones.Example
Consider all the spelling permutations possible for the artist
Cyndi Lauper, using double metaphone with trigram similarity we can achieve 100% similarity between the incorrect stringcindy lorperand the correct spelling:yields:
similarity real: 1(ie: 100% similarity)Which means the encodings are identical for both input strings using Double Metaphone. When using Metaphone, they're slightly different. All of the following yield
SNTLRPRyields:
SNTLPRwhich is only one character different toSNTLRPRYou can also use Levenshtein Distance to calculate it, which gives you a filterable parameter to work with:
yields:
levenshtein integer: 1It's a bit difficult to see exactly what you're having trouble with - without a more complete reprex.
both yield:
similarity real: 1(ie: 100% similarity)Edit: here's a easy to follow guide for fuzzy matching with postgres