regex matching duplicates in a comma separated list

657 Views Asked by At

I'm trying to regex match any duplicate words (i.e. alphanumeric and can have dashes) in some yaml with a PCRE tool.

I have found a consecutive, duplicate regex matcher:

(?<=,|^)([^,]*)(,\1)+(?=,|$)

it will catch:

hello-world,hello-world,goodbye-world,goodbye-world

but not the hello-worlds in

hello-world,goodbye-world,goodbye-world,hello-world

Could someone help me try to build a regex pattern for the second case (or both cases)?

2

There are 2 best solutions below

6
On BEST ANSWER

You may use this regex:

(?<=,|^)([^,]+)(?=(?>,[^,]*)*,\1(?>,|$)),

RegEx Demo

RegEx Details:

  • (?<=^|,): Assert that we have , or start position before current position
  • ([^,]+): Match 1+ of non-comma text and capture in group #1
  • (?=(?>,[^,]*)*,\1(?>,|$)): Lookahead to assert presence of same value we captured in group #1 ahead of us
  • ,: Match ,
1
On

Put an optional ,.* between the capture group and the back-reference.

(?<=,|^)([^,]*)(?:,.*)?(,\1)(?=,|$)

DEMO