extract values after comma's using regexp

384 Views Asked by At

Am using Oracle 19c database

Below is my string value

variable B1 varchar2(60)
exec :B1:='(199,''TEST121''),(156,''TEST''),(1561,''TEST99'')';

I want the output as

|    ID    |     NAME       |
| -------- | -------------- |
|       199|   TEST121      |
|       156|   TEST         |
|      1561|   TEST99       |
select  regexp_substr(regexp_substr(:b1,'[^A-Z+0-9][0-9]+', 1,level),'[0-9]+') as id , regexp_substr(:b1,'[A-Z]+[0-9]', 1,level) as name from dual connect by regexp_substr(:b1,'[0-9]', 1,level) is not null;

This query is only giving output for string values ending with digits.

2

There are 2 best solutions below

3
On BEST ANSWER

Here's one option:

SQL> WITH
  2     test (col)
  3     AS
  4        (SELECT '(199,''TEST121''),(156,''TEST''),(1561,''TEST99'')' FROM DUAL)
  5  SELECT SUBSTR (str, 1, INSTR (str, ',') - 1) id,
  6         SUBSTR (str, INSTR (str, ',') + 1) name
  7    FROM (    SELECT REGEXP_SUBSTR (
  8                        REPLACE (
  9                           REPLACE (
 10                              REPLACE (REPLACE (col, '),(', '#'), CHR (39), ''),
 11                              '(',
 12                              ''),
 13                           ')',
 14                           ''),
 15                        '[^#]+',
 16                        1,
 17                        LEVEL) str
 18                FROM test
 19          CONNECT BY LEVEL <= REGEXP_COUNT (REPLACE (col, '),(', '#'), '#') + 1);

ID                               NAME
-------------------------------- --------------------------------
199                              TEST121
156                              TEST
1561                             TEST99

SQL>

What does it do?

  • lines #1 - 4 - sample data
  • lines #8 - 14 - replace ),( with # (to get a simpler separator); remove leading and trailing brackets
  • lines #7 - 19 - split sample string into rows
  • lines #5, 6 - extract ID and NAME out of each row
0
On

Detailed explanation of the option given by @Littlefoot

Below is the test case for which multivalued string is converted in to columns and rows.

    set lines 999 pages 999
    col ID for a20
    col NAME for a20
    
    variable B1 varchar2(60)
    exec :B1:='(99,''TABLE1''),(56,''INDEX1''),(199,''TABLE''),(156,''INDEX'')';
    variable B1 varchar2(100)
    exec :B1:='(''TABL234E1~'',99),(''I1NDEX1~'',5ABC6),(''TAB1LE'',4ABC0),(''IND11EX'',6ACDE0)';
    
    WITH
       test (col)
       AS
          (SELECT :b1 FROM DUAL)
    SELECT SUBSTR (str, 1, INSTR (str, ',') - 1) id,
           SUBSTR (str, INSTR (str, ',') + 1) name
      FROM (    SELECT REGEXP_SUBSTR (
                          REPLACE (
                             REPLACE (
                                REPLACE (REPLACE (col, '),(', '#'), CHR (39), ''),
                                '(',
                                ''),
                             ')',
                             ''),
                          '[^#]+',
                          1,
                          LEVEL) str
                  FROM test
            CONNECT BY LEVEL <= REGEXP_COUNT (REPLACE (col, '),(', '#'), '#') + 1);

explanation:

    variable B1 varchar2(60)
    
    exec :B1:='(99,''TABLE1''),(56,''INDEX1''),(199,''TABLE''),(156,''INDEX'')';

STEP1

    WITH
      test (col)
      AS
         (SELECT :b1 FROM DUAL)
      select col from test;

This step helps in getting base string in oracle row format.

Result:

     (99,'TABLE1'),(56,'INDEX1'),(199,'TABLE'),(156,'INDEX')

STEP2

    WITH
      test (col)
      AS
         (SELECT :b1 FROM DUAL)
      select REPLACE(col, '),(','#') from test;

This step helps in replacing '),(' with '#' (to get a simpler separator for multivalued string).

Result :

    (99,'TABLE1'#56,'INDEX1'#199,'TABLE'#156,'INDEX')

STEP3

    WITH
      test (col)
      AS
         (SELECT :b1 FROM DUAL)
      select REPLACE(REPLACE(col, '),(','#'),CHR(39),'') from test;

This step helps in replacing single quotes from the preceding step.

Result:

    (TABL234E1~,99#I1NDEX1~,5ABC6#TAB1LE,4ABC0#IND11EX,6ACDE0)

STEP4

    WITH
      test (col)
      AS
         (SELECT :b1 FROM DUAL)
      select REPLACE(REPLACE(REPLACE(col, '),(','#'),CHR(39),''),'(','') from test;

This step helps in replacing '(' with '' (to remove the leading bracket from preceding step).

Result:

    99,TABLE1#56,INDEX1#199,TABLE#156,INDEX)

STEP5

    WITH
      test (col)
      AS
         (SELECT :b1 FROM DUAL)
      select REPLACE(REPLACE(REPLACE(REPLACE(col, '),(','#'),CHR(39),''),'(',''),')','') from test;

This step helps in replacing ')' with '' (to remove the trailing bracket from preceding step).

Result:

   99,TABLE1#56,INDEX1#199,TABLE#156,INDEX

STEP6

    WITH
       test (col)
       AS
          (SELECT :b1 FROM DUAL)
    SELECT *
      FROM (    SELECT REGEXP_SUBSTR (
                          REPLACE (
                             REPLACE (
                                REPLACE (REPLACE (col, '),(', '#'), CHR (39), ''),
                                '(',
                                ''),
                             ')',
                             ''),
                          '[^#]+',
                          1,
                          LEVEL) str
                  FROM test
            CONNECT BY LEVEL <= REGEXP_COUNT (REPLACE (col, '),(', '#'), '#') + 1);

This step have 2 components :

    a. REGEXP_SUBSTR(<value from previous step>,'[^#]+',1,LEVEL)
    '[^#]+' ==> searches and gets every character other than '#' 
    1==> starting position of the searches
    LEVEL==> LEVEL can be used in conjunction with CONNECT BY LEVEL clause. All the chunks of the source string can be displayed by using the LEVEL keyword as the match occurrence. 

    b. CONNECT BY LEVEL <= REGEXP_COUNT (REPLACE (col, '),(', '#'), '#') + 1
    REPLACE (col, '),(', '#') ==> This helps in replacing '),(' with '#' (to get a simpler separator for multivalued string).
    REGEXP_COUNT(<output from previous step>,'#')+1==> Count the number of '#' +1 in the source string. 
    Here, the CONNECT BY LEVEL clause generates the rows equal to the number of '#' +1 in the source string. 

Result:

    99,TABLE1
    56,INDEX1
    199,TABLE
    156,INDEX

STEP7

    WITH
       test (col)
       AS
          (SELECT :b1 FROM DUAL)
    SELECT SUBSTR (str, 1, INSTR (str, ',') - 1) id,
           SUBSTR (str, INSTR (str, ',') + 1) name
      FROM (    SELECT REGEXP_SUBSTR (
                          REPLACE (
                             REPLACE (
                                REPLACE (REPLACE (col, '),(', '#'), CHR (39), ''),
                                '(',
                                ''),
                             ')',
                             ''),
                          '[^#]+',
                          1,
                          LEVEL) str
                  FROM test
            CONNECT BY LEVEL <= REGEXP_COUNT (REPLACE (col, '),(', '#'), '#') + 1);
    

This step have 2 components : a.SUBSTR (str, 1, INSTR (str, ',') - 1)==> str is tken from preceding output and starting from 1st postion it greps the characters from ',' to -1 position. b.SUBSTR (str, INSTR (str, ',') + 1) name==> str is tken from preceding output and starting from ',' it greps the characters from ',' to +1 position.

Result:

    ID                   NAME
    -------------------- --------------------
    99                   TABLE1
    56                   INDEX1
    199                  TABLE
    156                  INDEX