When using alternation in regex, we should include items in the alternators in order to avoid being affected by eagerness of the engine.
Then if having a list such as co,co.,co-op,association,assoc
we should prefer to include them in order to get the most precise match. Then, this should be changed to association,assoc,co-op,co.,co
.
I have a basic regex pattern to split a word in two if hyphen or slash is included, so I get just the part before the hyphen or slash:
(.*(?<!\w)(CO-OP|CO|CO.)(?!\w).*)[-/](\s*\w+.*)
However, this regex is breaking incorrectly when providing ABC CO-OP ELEMENTARY SCHOOL
. This string is becoming just ABC CO
. However, if I remove CO from the alternators, the string is returned in its original form ABC CO-OP ELEMENTARY SCHOOL
which is correct. In addition, the string ARMSTRONG CO-OP ELEMENTARY SCHOOL / ECOLE PRIMAIRE ARMSTRONG COOPERATIVE
should be broken to become ARMSTRONG CO-OP ELEMENTARY SCHOOL
without the string after slash.
Why CO
is matched in the alternators and used to break the string?
Your issue is that your regex requires there to be a
-
or a\
in the string, so it is forcingABC CO-OP ELEMENTARY SCHOOL
to split on the-
inCO-OP
. If you:.*
at the end of the first group to be lazy (.*?
); andyou will get the results you want:
Demo on regex101
Note also that the
.
inCO.
should be escaped.