I tried ^(a|b|c)?(a|b|c)*\1$ and some variations but for some reason words like a or b aren't captured. Could you guys explain me why?

I also don't get why in the expression ^(a|b|c)?(a|b|c)\1$ the word cac is captured. I mean, I thought in this expression the word should just have 2 letters.

3

There are 3 best solutions below

1
jhnc On BEST ANSWER

No string less than length 2 can match ^(a|b|c)?(a|b|c)*\1$

  • (a|b|c)* matches 0 or more characters
  • \1 is required and matches exactly 1 character
  • if \1 matches, there must be an additional (a|b|c) at the start of the line

^(a|b|c)?(a|b|c)\1$ can only be matched by strings of exactly length 3:

  • The second (a|b|c) matches exactly 1 character
  • \1 is required and matches exactly 1 character
  • if \1 matches, there must be an additional (a|b|c) at the start of the line
0
Freeman On

Words like a or b are not captured because they don't have a repeating sequence of characters at the end and the expression requires the backreference \1 to match a character that was captured by the first group,since the first group is optional, if it doesn't capture anything, the backreference won't find a match, and in the expression ^(a|b|c)?(a|b|c)\1$, the word cac is captured because the first group (^(a|b|c)?) is optional and it can match either a, b, or c. In your case, it matches c and also the second group (a|b|c) matches a and the backreference \1 then matches the same c captured by the first group,so the word cac satisfies the condition of having a repeating sequence of characters at the end!

0
Nick On

@jhnc has explained the issues with your regex. If you want to match single character strings, you will need to modify your regex to make the rest of the match optional:

^([abc])(?:[abc]*\1)?$

This will match:

  • ^ : beginning of string
  • ([abc]) : one of a, b, or c, captured in group 1
  • (?:[abc]*\1)? : an optional group consisting of 0 or more a, b or c followed by the character matched in group 1
  • $ : end of string

Regex demo on regex101

Note that if you only want to match single characters a character class is more efficient than an alternation.