SQL Server Soundex() and Difference() to Compare a Columns observations to Itself

155 Views Asked by At

I am looking for some assistance in using Soundex() and Difference() in SQL Server with the goal of comparing one observation in a column to another observation in that same column.

Here is some context of how the data looks. All of these street addresses are intended to represent the same place, but with human error changing a small portion so it isn't exactly the same:

Street Address zip state
1440 thisisastreetaddress ave 12345 OH
1440 thisismystreetaddress st 12345 OH
1440 thisisthestreetaddress rd 12345 OH

The goal would be to use Soundex() and Difference() to group addresses that are similar to each other. Problem is, the table has over 2 million customers so the solution cannot operate on massive amounts of CPU. The end product doesn't have to be pretty, I don't mind sorting manually (which I've realized is what may need to happen to an extent), but I would appreciate the script grouping them for me.

Let me know if its possible with soundex() and difference() and, if its not with those two, any other functions or ways you may go about solving this. Also may be important to include that it can be a multiple software solution as the team and myself know python, R, power BI, etc.

Thanks!

0

There are 0 best solutions below