Regex: Alternators order issue

86 Views Asked by At

When using alternation in regex, we should include items in the alternators in order to avoid being affected by eagerness of the engine.

Then if having a list such as co,co.,co-op,association,assoc we should prefer to include them in order to get the most precise match. Then, this should be changed to association,assoc,co-op,co.,co.

I have a basic regex pattern to split a word in two if hyphen or slash is included, so I get just the part before the hyphen or slash:

(.*(?<!\w)(CO-OP|CO|CO.)(?!\w).*)[-/](\s*\w+.*)

However, this regex is breaking incorrectly when providing ABC CO-OP ELEMENTARY SCHOOL. This string is becoming just ABC CO. However, if I remove CO from the alternators, the string is returned in its original form ABC CO-OP ELEMENTARY SCHOOL which is correct. In addition, the string ARMSTRONG CO-OP ELEMENTARY SCHOOL / ECOLE PRIMAIRE ARMSTRONG COOPERATIVE should be broken to become ARMSTRONG CO-OP ELEMENTARY SCHOOL without the string after slash.

Why CO is matched in the alternators and used to break the string?

1

There are 1 best solutions below

0
On BEST ANSWER

Your issue is that your regex requires there to be a - or a \ in the string, so it is forcing ABC CO-OP ELEMENTARY SCHOOL to split on the - in CO-OP. If you:

  1. make the second part of the regex optional;
  2. change the .* at the end of the first group to be lazy (.*?); and
  3. add start and end-of-string anchors

you will get the results you want:

^(.*(?<!\w)(?:CO-OP|CO|CO\.)(?!\w).*?)(?:[-/](\s*\w+.*))?$

Demo on regex101

Note also that the . in CO. should be escaped.