Oracle text contains and markup have different behaviour

94 Views Asked by At

I have an Issue with Oracle text. I search like that:

WHERE
    ( contains(description_en, '%to_search.%', 1) > 0 );

Now Oracle returns me a row, where there is "to_search" but without the dot. I suspect that is because Oracle does handle that as wordbreak (Note: The Stoplist is empty).

But if I try to highlight the found result with a ctx_doc.markup call:

ctx_doc.markup(
      index_name => 'i_fil_lis_ce_fil_des_en',
      textkey => '12238',
      text_query => '%to_search.%',
      starttag => '<result>',
      endtag => '</result>',
      plaintext => true)

It does not highlight to_search unless I remove the dot in the text_key. This behaviour is the same for other symbols like ? / , / @. This makes no sense to me, shouldn't oracle treat both occurences of the dot the same?

I understand, that Oracle breaks the word at a dot and other symbols. So that if I search test it also finds test.. But why the other way around? why should I find test if I search for test.? and can I turn this off?

Version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit

2

There are 2 best solutions below

0
ibre5041 On

Check lexer config. I guess that dot is simply ignored by lexer. And Text "vocabulary" does not contain any word "test."

1
psaraj12 On

As per the Oracle Text docs

Please note the non-alphabetic characters like . ? / etc are treated as whitespace (so "to_search." is treated as 'to_search '). To match only to_search. you must declare "." as a printjoin. (If you do, however, to_search. will not match 'to_search . ') For more on printjoins, see BASIC_LEXER