regexp_substr all occurence of doc. XXX

176 Views Asked by At

Here a sample code:

set define off;
drop table brol;

create table brol
(
br varchar2(4000)
);

INSERT INTO brol ( br ) VALUES ( 'Proposition de décret institutionnalisant l assemblée citoyenne et le conseil citoyen, déposée par Madame Schyns (Doc. 221 (2019-2020) N° 1 et 1bis)
Proposition de décret spécial modifiant les articles 2, 5, 6 et 7 du décret spécial du 19 juillet 2018 instituant la consultation populaire, déposée par Madame Schyns (Doc. 222 (2019-2020) N° 1)
Proposition de modification du Règlement du Parlement de Wallonie visant à institutionnaliser les assemblées citoyennes et conseils citoyens, déposée par Madame Schyns (Doc. 223 (2019-2020) N° 1)
Rapporteur : Monsieur Sahli
Proposition de décret spécial modifiant les articles 2, 5, 6 et 7 du décret spécial du 19 juillet 2018 instituant la consultation populaire en vue d octroyer un droit d initiative aux commissions délibératives composées de députés et de citoyens tirés au sort, déposée par Madame Schyns, Messieurs Desquesnes, Antoine et Bastin (Doc. 278 (2020-2021) N° 1)
Désignation d un Rapporteur 
Discussion - Votes
' );

commit;

select * from brol;

When i execute the next query:

select REGEXP_SUBSTR(br,'Doc. ([[:alnum:]]+\.?)') from brol;

i have as result:

Doc. 221

That a first good step and I was very happy to find it... But I need also to have as return:

Doc. 222 
Doc. 223
Doc. 278 

I m not able to have all 3 at the same moment.. Could you please help me to find what i forgot in the regexp_substr ?

select REGEXP_SUBSTR(br,'Doc. ([[:alnum:]]+\.?)',1,1) from brol;

Doc. 221

select REGEXP_SUBSTR(br,'Doc. ([[:alnum:]]+\.?)',1,2) from brol;

Doc. 222

select REGEXP_SUBSTR(br,'Doc. ([[:alnum:]]+\.?)',1,3) from brol;

Doc. 223

select REGEXP_SUBSTR(br,'Doc. ([[:alnum:]]+\.?)',1,4) from brol;

Doc. 278

Is there a way to know how many I ll find ? ==> nO it s not possible..

I found the approach with a function https://stackoverflow.com/a/23683522/10710461

create or replace function regexp_substr_mr (

p_data clob, p_re varchar ) return varchar as v_cnt number; v_results varchar(4000); begin v_cnt := regexp_count(p_data, p_re, 1,'m'); if v_cnt < 25 then for i in 1..v_cnt loop v_results := v_results || regexp_substr(p_data,p_re,1,i,'m') || chr(13) || chr(10); end loop; else v_results := 'WARNING more than 25 matches found'; end if;

return v_results; end;

But i m not able to adapt it...

Thx,

R.D

4

There are 4 best solutions below

0
On

the function was perfect.. My call was not...

select regexp_substr_mr(br,'Doc. ([[:alnum:]]+\.?)' ) from brol;

It s working. Thank you all for your answer. I ll check what is exactly the "connect" option cause I dont know it. I m in a package so it s easy for me to add a function.

3
On

You have the right query and just use connect by to find all,

select REGEXP_SUBSTR(br,'Doc. ([[:alnum:]]+\.?)',1,level) 
  from brol
connect by REGEXP_SUBSTR(br,'Doc. ([[:alnum:]]+\.?)',1,level) is not null;
0
On

Is there a way to know how many I'll find?

Use REGEXP_COUNT:

SELECT REGEXP_COUNT( br, 'Doc\.\s+\d+' ) AS num_docs
FROM   brol;

Which outputs:

| NUM_DOCS |
| -------: |
|        4 |

Your function appears to output a CR-NL separated list of docs. Instead you can use:

SELECT RTRIM(
         REGEXP_REPLACE( br, '.*?(Doc\.\s+\d+\.?|$)', '\1' || CHR(13) || CHR(10), 1, 0, 'n' ),
         CHR(13) || CHR(10)
       ) AS docs
FROM   brol

Which outputs:

| DOCS     |
| :------- |
| Doc. 221 |
| Doc. 222 |
| Doc. 223 |
| Doc. 278 |
| :------- |

If you want the values on separate rows (and can have multiple rows in your brol table) then you can use:

WITH docs ( br, doc, lvl, num_docs ) AS (
  SELECT br,
         REGEXP_SUBSTR( br, 'Doc\.\s+\d+\.?', 1, 1 ),
         1,
         REGEXP_COUNT( br, 'Doc\.\s+\d+\.?' )
  FROM   brol
UNION ALL
  SELECT br,
         REGEXP_SUBSTR( br, 'Doc\.\s+\d+\.?', 1, lvl + 1 ),
         lvl + 1,
         num_docs
  FROM   docs
  WHERE  lvl < num_docs
)
SELECT doc
FROM   docs
WHERE  lvl <= num_docs;

Which outputs:

| DOC      |
| :------- |
| Doc. 221 |
| :------- |
| Doc. 222 |
| :------- |
| Doc. 223 |
| :------- |
| Doc. 278 |
| :------- |

db<>fiddle here

0
On

For simplicity, I added yet another column to your table - ID (which kind of uniquely identifies each row).

SQL> desc brol
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ID                                                 NUMBER
 BR                                                 VARCHAR2(4000)

I also added another row into the table by modifying INSERT you posted; it contains two "Doc." values.

Finally, such a query might return what you want:

SQL> select id,
  2    regexp_substr(br, 'Doc\.\s+\d+.?', 1, column_value) result
  3  from brol cross join table(cast(multiset(select level from dual
  4                                           connect by level <= regexp_count(br, 'Doc\.')
  5                                          ) as sys.odcinumberlist))
  6  order by id, result;

        ID RESULT
---------- ----------
         1 Doc. 221
         1 Doc. 222
         1 Doc. 223
         1 Doc. 278
         2 Doc. 666
         2 Doc. 777

6 rows selected.

SQL>