Overlapping matches in Regex - Scala

981 Views Asked by At

I'm trying to extract all posible combinations of 3 letters from a String following the pattern XYX.

val text = "abaca dedfd ghgig"
val p = """([a-z])(?!\1)[a-z]\1""".r
p.findAllIn(text).toArray

When I run the script I get:

aba, ded, ghg

And it should be:

aba, aca, ded, dfd, ghg, gig

It does not detect overlapped combinations.

2

There are 2 best solutions below

2
On BEST ANSWER

You need to capture the whole pattern and put it inside a positive lookahead. The code in Scala will be the following:

object Main extends App {
    val text = "abaca dedfd ghgig"
    val p = """(?=(([a-z])(?!\2)[a-z]\2))""".r
    val allMatches = p.findAllMatchIn(text).map(_.group(1))
    println(allMatches.mkString(", "))
    // => aba, aca, ded, dfd, ghg, gig
}

See the online Scala demo

Note that the backreference will turn to \2 as the group to check will have ID = 2 and Group 1 will contain the value you need to collect.

2
On

The way consists to enclose the whole pattern in a lookahead to consume only the start position:

val p = """(?=(([a-z])(?!\2)[a-z]\2))""".r
p.findAllIn(text).matchData foreach {
   m => println(m.group(1))
}

The lookahead is only an assertion (a test) for the current position and the pattern inside doesn't consume characters. The result you are looking for is in the first capture group (that is needed to get the result since the whole match is empty).