When using alternation in regex, we should include items in the alternators in order to avoid being affected by eagerness of the engine.
Then if having a list such as co,co.,co-op,association,assoc we should prefer to include them in order to get the most precise match. Then, this should be changed to association,assoc,co-op,co.,co.
I have a basic regex pattern to split a word in two if hyphen or slash is included, so I get just the part before the hyphen or slash:
(.*(?<!\w)(CO-OP|CO|CO.)(?!\w).*)[-/](\s*\w+.*)
However, this regex is breaking incorrectly when providing ABC CO-OP ELEMENTARY SCHOOL. This string is becoming just ABC CO. However, if I remove CO from the alternators, the string is returned in its original form ABC CO-OP ELEMENTARY SCHOOL which is correct. In addition, the string ARMSTRONG CO-OP ELEMENTARY SCHOOL / ECOLE PRIMAIRE ARMSTRONG COOPERATIVE should be broken to become ARMSTRONG CO-OP ELEMENTARY SCHOOL without the string after slash.
Why CO is matched in the alternators and used to break the string?
Your issue is that your regex requires there to be a
-or a\in the string, so it is forcingABC CO-OP ELEMENTARY SCHOOLto split on the-inCO-OP. If you:.*at the end of the first group to be lazy (.*?); andyou will get the results you want:
Demo on regex101
Note also that the
.inCO.should be escaped.