Teradata SQL to Extract Records Based on Approximate String Matching

3.5k Views Asked by At

We are on version TD 14 and I come from Netezza / Postgre(Redshift) background. I have been asked to extract a login data from audit logs to find out records/transactions where the same ip is submitting similar looking usernames with small changes. e.g Samir --> Samr --> Amir etc To capture phishing activity. In POstgres we have fuzzy string functions like '%' e.g ColA % ColB (where % operator is equivalent to Similar) Soundex, Metaphone, levenshtein etc. In Teradata however I have just encountered or I have been able to find just Soundex. Is there any such in built function/method capability with Teradata version 14 to achieve the above string approximation.

1

There are 1 best solutions below

0
Rob Paller On

Teradata 14.x supports the Damerau-Levenshtein Distance algorithm via the EDITDISTANCE() function and n-gram pattern matching via the NGRAM() function.

You can find information about the EDITDISTANCE function here and the NGRAM() function here.