How to define this regex with alternation to have the same number of groups in the match?

88 Views Asked by At

I am trying to parse such strings

99_GOG_A_X1_FOO X-2014-09
99_YAK_A_YZ1_BAR YZY-2014-10

with this regex

99_\w{3}_(A|B)_((X)(0*[1-9][0-9]?)_(FOO|BAR) X-(\b0*20(1[4-9]|[2-9][0-9])\b)-\b0*([1-9]|1[0-2])\b|(YZ)(0*[1-9][0-9]?)_(FOO|BAR) YZY-(\b0*20(1[4-9]|[2-9][0-9])\b)-\b0*([4]|1[0])\b)

The first input can only have 1 to 12 at the end while the second input can only have 04 or 10. This works. but i would like to have a solution which only returns matching groups.

With this solution I have these groups http://www.regexplanet.com/cookbook/ahJzfnJlZ2V4cGxhbmV0LWhyZHNyDwsSBlJlY2lwZRiwuuALDA/index.html

I have superfluous groups and the matching groups are not on the same indexes for both inputs.

Is there a way to get rid of the empty groups and to align the indexes?

Update: I have to keep the following rules. If the input matches this _((X)(0*[1-9][0-9]?) it must also contain X- and allow this range at the end \b0*([4]|1[0])\b

If the input matches this _(YZ)(0*[1-9][0-9]?) it must also contain YZY- and allow only this inputs at the end \b0*([4]|1[0])\b

So I want to combine these regex:

^99_\w{3}_(A|B)_(X)(0*[1-9][0-9]?)_(FOO|BAR) X-(\b0*20(1[4-9]|[2-9][0-9])\b)-\b0*([1-9]|1[0-2])\b$
^99_\w{3}_(A|B)_(YZ)(0*[1-9][0-9]?)_(FOO|BAR) YZY-(\b0*20(1[4-9]|[2-9][0-9])\b)-\b0*([4]|1[0])\b$
1

There are 1 best solutions below

7
On

From what I understand, this could work for you:

99_\w{3}_(A|B)_((X|(?>YZ))(0*[1-9][0-9]?)_(FOO|BAR) \3Y?-(\b0?20(1[4-9]|[2-9][0-9])\b)-\b0?((?:[1-9]|1[0-2])|(?:[4]|1[0]))\b$)

DEMO

If this fails (which it might) you should consider use actual code to make sure it passes "((X)(0*[1-9][0-9]?) it must also contain X- and allow this range at the end \b0*([4]|1[0])\b"

Use the capture groups to check if \3 == 'X' or if \3 == 'YZ' and then apply the rest of the regex as necessary