Oracle regular expression match string from last occurence

682 Views Asked by At

I'm still learning regexp in oracle and stuck with below error. Below is my sample code

SELECT DISTINCT COALESCE(TO_NUMBER(regexp_substr(USERNAME, '[^.]+', 1, 2)), ID) ID , 
                COALESCE(regexp_substr(USERNAME, '[^.]+', 1, 1), USERNAME) AS USERNAME 
  FROM logs;
ORA-01722: invalid number 
01722. 00000 -  "invalid number"
*Cause:    The specified number was invalid.
*Action:   Specify a valid number.

Table Data

Username                ID
Ravi.1234              1234
Krishna.12345          12345
Ravi.Krishna.1234567   1234567
R.Krishna.987          987
Ravi.K.567890          567890
R.Krish                123
Ravi                   456

Expected Output

ID             Username
1234            Ravi
12345           Krishna
1234567         Ravi.Krishna
987             R.Krishna
567890          Ravi.K

How to reframe the query to get the output needed. Can substr be used instead of regexp will it give desired output? This is used in oracle database not in sql. Thanks in advance.

2

There are 2 best solutions below

0
On

If I understood your assignment correctly (see my comments under your question), here is how you can do this with standard string functions and conditions:

with
  table_data (username, id) as (
    select 'Ravi.1234'           , '1234'    from dual union all
    select 'Krishna.12345'       , '12345'   from dual union all
    select 'Ravi.Krishna.1234567', '1234567' from dual union all
    select 'R.Krishna.987'       , '987'     from dual union all
    select 'Ravi.K.567890'       , '567890'  from dual union all
    select 'R.Krish'             , '123'     from dual union all
    select 'Ravi'                , '456'     from dual
  )
select id, substr(username, 1, instr(username, '.', -1) - 1) as username
from   table_data
where  username like '%.' || id
;

ID      USERNAME            
------- --------------------
1234    Ravi                
12345   Krishna             
1234567 Ravi.Krishna        
987     R.Krishna           
567890  Ravi.K  

In the LIKE condition in the WHERE clause, % is a wildcard for "any string of any length, including zero"; that must be followed by a literal dot and then by the ID, and that must be the whole USERNAME string. In select, instr(username, '.', -1) finds the position of the "first" dot in username, but counting from the end and moving left - that is what the minus sign means.

With regular expression functions and conditions:

select id, regexp_substr(username, '^(.*)\.' || id || '$', 1, 1, null, 1) as username
from   table_data
where  regexp_like(username, '\.' || id || '$')
;

The sixth argument to regexp_substr means "the first substring enclosed in parentheses" (first "capture group" is the technical term).

2
On

I think REGEXP_REPLACE() would suit well for your case while filtering out the values having at least one digit. In the current case, you're trying to convert the second portions of the Username strings to number, but not all of them are numeric, the error raises due to this. Moreover, you can also extract the ID column from Username column. e.g. no need to hold seperate ID column within your original table.

Thus, consider using

SELECT TO_NUMBER( REGEXP_REPLACE(Username, '[^0-9]+') ) AS ID,
       RTRIM( REGEXP_REPLACE(Username, '[^.]+$'),'.') AS "Username"
  FROM logs
 WHERE REGEXP_LIKE(Username,'[0-9]')

the following option would be an alternative to above one by using less Regular expression

SELECT TO_NUMBER( SUBSTR( Username, INSTR(Username, '.',-1)+1, LENGTH( Username ) )) AS ID,
       SUBSTR( Username, 1, INSTR(Username, '.',-1)-1 ) AS "Username"
  FROM logs
 WHERE REGEXP_LIKE(Username,'[0-9]') 

Demo