I have a piece of text that is repeated several times. Here you have a sample of that text:
The idea is to have a regular expression with three groups and repeat this for any match along with the text. Here you have an example of a possible match:
group1 = HORIZON-CL5-2021-D1-01 group2 (Opening) = 15 Apr 2021 group3 (Deadlines(s)) = 07 Sep 2021 group1 = HORIZON-CL5-2022-D1-01-two-stage group2 (Opening) = 04 Nov 2021 group3 (Deadlines(s)) = 15 Feb 2022 (First Stage), 07 Sep 2022 (Second Stage)
I am trying with this regular expression:
\n(HORIZON-\S+-[A-Z]{1}\d{1}-\d{2}).*?^Opening
It almost works. What I need is to say in the regular expression two more things:
- That there are cases that after the last number of HORIZON... might appear some text, like in the second case:
HORIZON-CL5-2022-D1-01 -two-stage
- I need to say catch everything until the word 'Opening:' appears at the beginning of a line. I thought was doing this with this part of the expression
.*?^Opening
but it seems is not correct.
How can I solve this?
To get the
-two-stage
in group 1, you can add matching 0+ non whitespace chars\S*
to the existing group.You don't need the
s
modifier to make the dot match a newline. Instead, you can match all lines that do not start with Opening using a negative lookahead, and then match Opening and capture the date and the deadline part in a capture group.Note that you can omit
{1}
Regex demo
You could make the group starting with a date like part as specific as you want, as
.+
is a broad match.For example
Regex demo