I ran into a weird behavior from Oracle this morning... And I can't understand why it acts this way from the docs. I'm sorry for the long post, but I want to make sure I'm understood. Oh, and make sure to read the note at the end before answering. :)
The goal of the request is to return rows with 1 or more lowercase characters. For the sake of the example, my table will be:
CREATE TABLE "TEMP_TABLE"
( "VAL" VARCHAR2(4000 BYTE) );
Insert into TEMP_TABLE (VAL) values ('00A00');
Insert into TEMP_TABLE (VAL) values ('00000');
Insert into TEMP_TABLE (VAL) values ('BC000');
Insert into TEMP_TABLE (VAL) values ('ABC00');
Insert into TEMP_TABLE (VAL) values ('AAAAA');
Insert into TEMP_TABLE (VAL) values ('abc00');
Using this SQL query:
select val,
case when regexp_like (val, '[a-b]')
then 'MATCH'
else 'NO'
end
from temp_table;
If the NLS_SORT value of the session is set to BINARY
, Oracle returns:
00A00 NO
00000 NO
BC000 NO
ABC00 NO
AAAAA NO
abc00 MATCH
All good here: the only word containing a lowercase letter matches - the others don't.
But if NLS_SORT is set to FRENCH
, the results are less understandable:
00A00 NO
00000 NO
BC000 MATCH
ABC00 MATCH
AAAAA NO
abc00 MATCH
From what I can deduce, the regexp matches when there are characters other than A
.
So my question is: Why would Oracle understand [a-z]
as "rows with letters that are not A
"?
Notes:
The database is Oracle 10G(r2), and the session's NLS parameters are these:
NLS_CALENDAR GREGORIAN NLS_COMP BINARY NLS_CURRENCY ¿ NLS_DATE_FORMAT DD/MM/RR HH24:MI NLS_DATE_LANGUAGE FRENCH NLS_DUAL_CURRENCY ¿ NLS_ISO_CURRENCY FRANCE NLS_LANGUAGE FRENCH NLS_LENGTH_SEMANTICS BYTE NLS_NCHAR_CONV_EXCP FALSE NLS_NUMERIC_CHARACTERS , NLS_SORT FRENCH_M NLS_TERRITORY FRANCE NLS_TIME_FORMAT HH24:MI:SSXFF NLS_TIMESTAMP_FORMAT DD/MM/RR HH24:MI:SSXFF NLS_TIMESTAMP_TZ_FORMAT DD/MM/RR HH24:MI:SSXFF TZR NLS_TIME_TZ_FORMAT HH24:MI:SSXFF TZR
Yes, I could use
regexp_like(val, '[[:lower:]]')
. But I found out about this later on, and it doesn't explain the weird behaviour.
For better or worse, the sort ordering defined by
nls_sort
is being used to evaluate the[a-z]
regexp. If you inserta,b,c,A,B,C
intotemp_table
and sort it under each setting you'll get the following:Since the uppercase letters are "interleaved" with the lowercase letters in the French setting it evaluates to true in Oracle's implementation.