regular expression for matching everything until a word is found

925 Views Asked by At

I have a piece of text that is repeated several times. Here you have a sample of that text:

DEMO of the text

The idea is to have a regular expression with three groups and repeat this for any match along with the text. Here you have an example of a possible match:

group1 = HORIZON-CL5-2021-D1-01
group2 (Opening) = 15 Apr 2021
group3 (Deadlines(s)) = 07 Sep 2021


group1 = HORIZON-CL5-2022-D1-01-two-stage
group2 (Opening) = 04 Nov 2021
group3 (Deadlines(s)) = 15 Feb 2022 (First Stage), 07 Sep 2022 (Second Stage)

I am trying with this regular expression:

\n(HORIZON-\S+-[A-Z]{1}\d{1}-\d{2}).*?^Opening

It almost works. What I need is to say in the regular expression two more things:

  1. That there are cases that after the last number of HORIZON... might appear some text, like in the second case:

HORIZON-CL5-2022-D1-01 -two-stage

  1. I need to say catch everything until the word 'Opening:' appears at the beginning of a line. I thought was doing this with this part of the expression .*?^Opening but it seems is not correct.

How can I solve this?

3

There are 3 best solutions below

4
On BEST ANSWER

To get the -two-stage in group 1, you can add matching 0+ non whitespace chars \S* to the existing group.

You don't need the s modifier to make the dot match a newline. Instead, you can match all lines that do not start with Opening using a negative lookahead, and then match Opening and capture the date and the deadline part in a capture group.

Note that you can omit {1}

^(HORIZON-\S+-[A-Z]\d-\d{2}\S*)(?:\r?\n(?!Opening\b).*)*\r?\nOpening: (.+)\r?\nDeadline\(s\): (.+)

Regex demo

You could make the group starting with a date like part as specific as you want, as .+ is a broad match.

For example

^(HORIZON-\S+-[A-Z]\d-\d{2}\S*)(?:\r?\n(?!Opening\b).*)*\r?\nOpening: (\d{2} [A-Z][a-z]{2} \d{4})\r?\nDeadline\(s\): (\d{2} [A-Z][a-z]{2} \d{4}.*)

Regex demo

1
On

You can have something like this: HORIZON-\S+-[A-Z]{1}\d{1}-\d{2}(-[^\s]*)? . I added the (-[^\s]*)? part. Here I am telling the regex to match something that starts with - until a white space (\s) is found. The ? makes this part optional so it can show up once or not at all.

0
On
  1. In your pattern you are reppeated HORIZON-... in the first group e.g. HORIZON-()-A1-11HORIZON-+-B2-33 while this should not appear in your input it should not be a problem.

  2. The Opening is required in your pattern, I would replace it with a positive lookahead (Opening|$), where $ denotes end of line.

  3. It seems you are not doing anything with the parts of the string you are retrieving, from your examples I think you could simply match non-spaces.

const pattern = /\n(HORIZON-\S+)\s*(.*?)\s*(?=Opening|$)/
  1. If yow want to keep the original pattern and capture the rest of the text in a separate group it would be /\n(HORIZON-\S+-[A-Z]{1}\d{1}-\d{2})(\S*)\s*(.*?)\s*(?=Opening|$)/. The

  2. The expression beginning in '\n' does not match the first line, you could change it to /^(HORIZON-\S+-[A-Z]{1}\d{1}-\d{2})(\S*)\s*(.*?)\s*(?=Opening|$)/.