Oracle text query - contains and special characters

4.2k Views Asked by At

We are using an oracle text query to perform a search of a table based on user input.

So if the user types in "blue sky" - we want to search for each word on an OR basis, so we do - where contains(columnname,'blue or sky',1)>0

so we take the user input and replace spaces with the word 'or' before passing to the query.

This works fine and we are able to order descending on the score to give us the most relevant entries first.

However we have then had problem with 'special' characters - it started with a comma but then I found in the documentation there are loads of them.

So we wrote some code which detected each 'special' character and prefixed it with an escape '\' character. This also works ok. But then there are also the reserved words, eg AND - so if a user types in 'jack and jill' - we convert to 'jack or and or jill' and this gives a text query parser syntax error because of the use of the word 'and' - so then trying to cater for those too but having to try to determine where they are prefixed and suffixed with spaces so as not to pick out 'handy' for example. Except of course it could be the first or last word.....grrr there must be an easier way to do this.....

Then I read up on the {} braces option so this escapes the whole string.

Question - can I just do this, even if there are no special characters in the string?

Also I can't see how this caters for the OR functionality we need for each word - so if I do contains(columnname,'{jack or xxxxx}',1) > 0) it does not return anything.

Any advice will be greatfully received, thanks!

1

There are 1 best solutions below

4
On

Maybe I do not answer exactly your question, but why don't you use this way to search?

with input as (select 'blue red white' example from dual),
     split_rule as (select '[^ ]+' pattern from dual),
     input_array as (select /* + materialize */ regexp_substr(example,pattern,1,level) word
                     from input, split_rule
                     connect by level <= regexp_count(example,pattern)),
     search_table as (select 'blue sky' item from dual
                      union all
                      select 'green grass' from dual
                      union all
                      select 'red apple' from dual
                      union all
                      select 'orange juice' from dual)
select item string_found,
       word hit_by
from input_array,search_table
where item like '%'||word||'%';

This should be the same in performance; "materialize" hint to disallow Oracle to export connect by to outside.

Still if you want to parse the string to words outside of the query - simply create an Oracle temporary table, fill it on every request with user search words (imitate "input_array" from the query above) and use it.

EDIT 1: As far as you provided us with some additional information, I update the answer. The top part remains the same, simply change the query:

1) If your rating is based on different words only use this query:

with input as (select 'blue red white' example from dual),
     split_rule as (select '[^ ]+' pattern from dual),
     input_array as (select /* + materialize */ regexp_substr(example,pattern,1,level) word
                     from input, split_rule
                     connect by level <= regexp_count(example,pattern)),
     search_table as (select 'blue sky red' item from dual
                      union all
                      select 'green grass' from dual
                      union all
                      select 'red apple blue white' from dual
                      union all
                      select 'orange juice' from dual)
select item string_found, count(*) rate
from input_array,search_table
where item like '%'||word||'%'
group by item
order by 2 desc;

2) If your rating is based on overall amount of hits:

with input as (select 'blue red white' example from dual),
     split_rule as (select '[^ ]+' pattern from dual),
     input_array as (select /* + materialize */ regexp_substr(example,pattern,1,level) word
                     from input, split_rule
                     connect by level <= regexp_count(example,pattern)),
     search_table as (select 'blue sky red blue blue' item from dual
                      union all
                      select 'green grass' from dual
                      union all
                      select 'red apple blue white' from dual
                      union all
                      select 'orange juice' from dual)
select item string_found, sum(regexp_count(item,word)) rate
from input_array,search_table
where item like '%'||word||'%'
group by item
order by 2 desc;

EDIT 2: To use regexp_count in Oracle 10g replace it with query like

select length(no_double_spaces) - length(replace(no_double_spaces,' ')) + 1 amount_of_words
from (select trim(regexp_replace('blue  red white','[ ]+',' ')) no_double_spaces
      from dual);

And now I show how to use it with a first query of Edit 1:

with input as (select 'blue red white' example from dual),
     split_rule as (select '[^ ]+' pattern from dual),
     input_array as (select /* + materialize */ regexp_substr(example,pattern,1,level) word
                     from input, split_rule
                     connect by level <= (select length(no_double_spaces) - length(replace(no_double_spaces,' ')) + 1 amount_of_words
                                          from (select trim(regexp_replace(example,'[ ]+',' ')) no_double_spaces
                                                from input)
                                         )
                    ),
     search_table as (select 'blue sky red blue blue' item from dual
                      union all
                      select 'green grass' from dual
                      union all
                      select 'red apple blue white' from dual
                      union all
                      select 'orange juice' from dual)
select item string_found, count(*) rate
from input_array,search_table
where item like '%'||word||'%'
group by item
order by 2 desc;