Fuzzy searching from C#

1k Views Asked by At

I have a database containing records, some of the properties form an address. I have a c# web app that features searches by address, but I need more than just the wildcard symbol to retrieve matches. Is there a means of implementing a fuzzy/rough search from the web app?

My two parameters are:
Address
Postcode

And only one needs to be populated to complete the search. Searching with both parameters should also be an available option.

1

There are 1 best solutions below

1
On

Fuzzy matching is usually not built into DBs because there is no efficient way to index columns in this way. Basically you'll either have to run the fuzzy matching algorithm on every row or you have to create an index of every possible fuzzy match for each row. One will make searching slow, the other would make insertions slow and drastically increase the size of the DB. Based on the exact fuzzy match and tolerance there could be a hybrid solution that you could implement, but this will not be a trivial task. My own experience with fuzzy matching was to always have one index that had to be an exact match so that the amount of data that I had to run the fuzzy match on would be limited. If that is not possible in your case then building the index of all matching fuzzy matches might be the only solution. Finally you might want to back up and ask yourself if you really need a fuzzy match or if you just need to maybe break the address look-up into the numerical part and the street name. Both of those can be extracted from the address that the user enters before you attempt the look-up. Then you'd just have to store the numerical and street portions of your address in your DB separately.

EDIT

One option would be to do an exact match on the numerical portion of the address, get the results back from the DB and use the fuzzy match on the street portion to eliminate and order the results. But this could get tricky with some odd ball addresses that might not have a numerical part, or if the user spells out the numerical part like "One Main St". Also the best way to pull this off would be to create a separate columns for the numerical and street name portions of the address, which means updating your DB and doing some parsing on your data. And then you might have to deal with other issues in the address like "SW" vs "South West" that could cause the fuzzy matching to fail.