problem with special characters in regexp funcion

57 Views Asked by At

I parse string in Oracle but names of providers have a special characters like '&' and '-'. And i can't handle with it. Could anyone help me ?

SELECT arraylist.* , inserted.*
FROM (
    select trim(regexp_substr(str,'[^;]+', 1, level)) as str1
    from ( 
       SELECT ('Cogent Communications Poland Sp. z.o.o. - 100000Mbps;E-point - 100000Mbps; T-Mobile - 100000Mbps; Net  Friends - 100000Mbps' ) as Str 
       FROM dual
       
    
    )
    connect by regexp_substr(str, '[^;]+', 1, level) is not null
) inserted
CROSS APPLY(
    select trim(regexp_substr(str1,'[^-]+', 1, 1)) as key,
           trim(regexp_substr(str1,'[^-]+', 1, 2)) as value
    from dual
) arraylist

And a results are: enter image description here

1

There are 1 best solutions below

0
On

Here's one way to handle multi-character delimited strings. First CTE just sets up data. tbl_case is the result of splitting on the semi-colon. Lastly split on the delimiter of ' - ' to get your name-value pairs.

Note a couple of things. The values are not realistic, but serve to differentiate the differing rows. The regex form of '(.*?)(;|$)' handles NULL elements properly, should they occur. See this post for more info on that.

-- Set up data
WITH tbl(ID, DATA) AS (
  SELECT 1, 'Cogent Communications Poland Sp. z.o.o. - 100001Mbps;E-point - 100002Mbps; T-Mobile - 100003Mbps; Net  Friends - 100004Mbps' FROM dual UNION ALL
  SELECT 2, 'Cogent Communications Poland Sp. z.o.o. - 200001Mbps;E-point - 200002Mbps; T-Mobile - 200003Mbps; Net  Friends - 200004Mbps' FROM dual
),
-- Split on semi-colons
tbl_case(ID, CASE) AS (
SELECT ID,
       TRIM(REGEXP_SUBSTR(DATA, '(.*?)(;|$)', 1, LEVEL, NULL, 1)) CASE
FROM tbl
CONNECT BY REGEXP_SUBSTR(DATA, '(.*?)(;|$)', 1, LEVEL) IS NOT NULL 
  AND PRIOR ID = ID
  AND PRIOR SYS_GUID() IS NOT NULL        
)  
--select * from tbl_case;         
-- Parse cases into name/value pairs
SELECT ID,
       REGEXP_REPLACE(CASE, '^(.*) - .*', '\1') name,
       REGEXP_REPLACE(case, '.* - (.*)$', '\1') value
from tbl_case



        ID NAME                                     VALUE               
---------- ---------------------------------------- --------------------
         1 Cogent Communications Poland Sp. z.o.o.  100001Mbps          
         1 E-point                                  100002Mbps          
         1 T-Mobile                                 100003Mbps          
         1 Net  Friends                             100004Mbps          
         2 Cogent Communications Poland Sp. z.o.o.  200001Mbps          
         2 E-point                                  200002Mbps          
         2 T-Mobile                                 200003Mbps          
         2 Net  Friends                             200004Mbps          

8 rows selected.