Oracle REGEXP_REPLACE string delimiter handling not working

45 Views Asked by At

I have incoming data which is delimiter by a string token:

Data 1__##__Data Two__##__Third Data__##__4th Data__##__

I would like to split this by the delimiter __##__ to give:

Data 1
Data Two
Third Data
4th Data

I figured on looping and using this to get each data item in turn, but as a first stage to everything from the beginning to the nth item inclusive:

SELECT REGEXP_REPLACE( 'Data 1__##__Data Two__##__Third Data__##__4th Data__##__', '__##__.+', 1, 1 ) from dual;

The above gives the desired Data 1, however:

SELECT REGEXP_REPLACE( 'Data 1__##__Data Two__##__Third Data__##__4th Data__##__', '__##__.+', 1, 2 ) from dual;

expected to give Data1__##__Data Two instead gives the whole input string back unchanged.

Earlier I tried the following but this just returns blank:

SELECT REGEXP_SUBSTR( 'Data 1__##__Data Two__##__Third Data__##__4th Data__##__', 
                      '(?!__##__)',
                       1, LEVEL )
FROM dual
CONNECT BY REGEXP_SUBSTR( 'Data 1__##__Data Two__##__Third Data__##__4th Data__##__', 
                      '(?!__##__)',
                       1, LEVEL ) IS NOT NULL

This returns null.

2

There are 2 best solutions below

0
On BEST ANSWER

You do not need (slow) regular expressions and can use a recursive query and simple (faster) string functions:

WITH bounds (value, spos, epos) AS (
  SELECT value,
         1,
         INSTR(value, '__##__', 1)
  FROM   table_name
UNION ALL
  SELECT value,
         epos + 6,
         INSTR(value, '__##__', epos + 6)
  FROM   bounds
  WHERE  epos > 0
) SEARCH DEPTH FIRST BY value SET order_id
SELECT SUBSTR(value, spos, epos - spos) AS value 
FROM   bounds
WHERE  epos > 0

Which, for the sample data:

CREATE TABLE table_name (value) AS
SELECT 'Data 1__##__Data Two__##__Third Data__##__4th Data__##__' FROM DUAL;

Outputs:

VALUE
Data 1
Data Two
Third Data
4th Data

Your query does not work because Oracle does not support zero-width look-arounds (neither ahead nor behind, positive nor negative) in regular expressions. Although, even if Oracle did support it I'm not sure that it would give the answer you expect.

If you really want to use regular expressions, you can fix your query to:

SELECT REGEXP_SUBSTR(
         'Data 1__##__Data Two__##__Third Data__##__4th Data__##__', 
         '(.*?)__##__',
         1,
         LEVEL,
         NULL,
         1
       ) AS value
FROM   dual
CONNECT BY REGEXP_COUNT(
         'Data 1__##__Data Two__##__Third Data__##__4th Data__##__', 
         '(.*?)__##__'
       ) >= LEVEL;

Which outputs the same as the previous query (although slower) for a single row of input (but would start to generate exponentially increasing numbers of duplicate rows if you provide multiple rows of input).

fiddle

2
On

For the sake of argument, this method uses regexp_substr with connect by and it handles multiple rows of input data:

with tbl(id, str) as (
  select 1, 'Data 1__##__Data Two__##__Third Data__##__4th Data__##__' 
  from dual union all
  select 2, 'Data 5__##__Data six__##__Seventh Data__##__8th Data__##__' 
  from dual union all
  select 3, 'Data 9__##__Data Ten__##__Eleventh Data__##__12th Data__##__' 
  from dual
)
select id, level as element,
  regexp_substr(str, '(.*?)(__##__)', 1, level, null, 1) data
from tbl
connect by level <= regexp_count(str, '__##__')
and prior id = id
and prior sys_guid() is not null;

        ID    ELEMENT DATA                
---------- ---------- --------------------
         1          1 Data 1              
         1          2 Data Two            
         1          3 Third Data          
         1          4 4th Data            
         2          1 Data 5              
         2          2 Data six            
         2          3 Seventh Data        
         2          4 8th Data            
         3          1 Data 9              
         3          2 Data Ten            
         3          3 Eleventh Data       
         3          4 12th Data           

12 rows selected.