How can I properly sanitize the search string in C# without reducing the search performance?

865 Views Asked by At

I need to sanitize the user's input string because I use the string later in an OLEDB Query to index a document repository to find matching files, descriptions, etc. The problem is that I cannot use strings that contain special characters in the query in the CONTAINS statement.

Is there a better way than what I'm doing to sanitize without reducing accuracy?

What I'm currently doing is that I'm Getting the search string, and validating it using the regex: [^0-9a-zA-Z\s\/\._-]+ and replacing it by empty string to remove any kind of special characters in the search string.

My problem is that some files and descriptions contain special characters like & and $, and if I disallow any kind of special characters, the search accuracy would go down. Is there a more efficient way to do this?

1

There are 1 best solutions below

0
On

Using regex is definitely the correct way to go. I don't think any other library specific function or any third party library for that matter is needed for this task or they can better the performance from using a regex. Anyways, few points :- Allow the special characters that might be present in a description( don't chuck them out through the regex ), exclude the rest. But I assume you are in a loop here, 'CONTAINS' won't take a special character but you need some of the special characters. If this is the case then you can go ahead and code a local function which does exactly what contains does, minus the check for some particular special characters that you need. Use the local function to query in place of CONTAINS. I cannot think of any other obvious way than this. Or, overhaul the entire searching logic and search with keys/fields that are going to stay unique and special characters free. Anyways, I don't think searching for files in DB on the basis of their description is a very brilliant idea.