Why doesn't parser combinator backtrack?

1.5k Views Asked by At

Consider

import util.parsing.combinator._
object TreeParser extends JavaTokenParsers {

    lazy val expr: Parser[String] = decimalNumber | sum
                                                  //> expr: => TreeParser.Parser[String]
    lazy val sum: Parser[String] = expr ~ "+" ~ expr ^^ {case a ~ plus ~ b => s"($a)+($b)"}
                                                  //> sum: => TreeParser.Parser[String]
    println(parseAll(expr, "1 + 1"))                       //> TreeParser.ParseResult[String] = [1.3] failure: string matching regex 
                                              //| `\z' expected but `+' found
}

The same story with fastparse

import fastparse.all._
val expr: P[Any] = P("1" | sum)
val sum: P[Any] = expr ~ "+" ~ expr
val top = expr ~ End
println(top.parse("1+1")) // Failure(End:1:2 ..."+1")

Parsers are great to discover that taking the first literal is a bad idea but do not try to fall back to the sum production. Why?

I understand that parser takes the first branch that can successfully eat up a part of input string and exits. Here, "1" of expression matches the first input char and parsing completes. In order to grab more, we need to make sum the first alternative. However, plain stupid

lazy val expr: Parser[String] = sum | "1"

endы up with stack overflow. The library authors therefore approach it from another side

val sum: P[Any] = P( num ~ ("+".! ~/ num).rep )
val top: P[Any]   = P( sum ~ End )

Here, we start sum with terminal, which is fine but this syntax is more verbose and, furthermore, it produces a terminal, followed by a list, which is good for a reduction operator, like sum, but is difficult to map to a series of binary operators.

What if your language defines expression, which admits a binary operator? You want to match every occurrence of expr op expr and map it to a corresponding tree node

expr ~ "op" ~ expr ^^ {case a ~ _ ~ b => BinOp(a,b)"} 

How do you do that? In short, I want a greedy parser, that consumes the whole string. This is what I mean by 'greedy' rather than greedy algorigthm that jumps into the first wagon and ends up in a dead end.

2

There are 2 best solutions below

5
On

The parser does backtrack. Try val expr: P[String] = P(("1" | "1" ~ "+" ~ "1").!) and expr.parse("1+1") for example.

The problem is in your grammar. expr parses 1 and it is a successful parsing by your definition. Then sum fails and now you want to blame the dutiful expr for what happened?

There are plenty of examples on how to deal with binary operators. For example, the first example here: http://lihaoyi.github.io/fastparse/

0
On

As I have found here, we need to replace | alternative operator with secret |||

//lazy val expr: Parser[String] = decimalNumber | sum
lazy val backtrackGreedy: Parser[String] =  decimalNumber ||| sum

lazy val sum: Parser[String] = decimalNumber ~ "+" ~ backtrackGreedy ^^ {case a ~ plus ~ b => s"($a)+($b)"}

println(parseAll(backtrackGreedy, "1 + 1")) // [1.6] parsed: (1)+(1)

The order of alternatives does not matter with this operator. To stop stack overflow, we need to eliminate the left recursion, sum = expr + expr => sum = number + expr.

Another answer says that we need to normalize, that is instead of

  def foo = "foo" | "fo"
  def obar = "obar"

  def foobar = foo ~ obar

we need to use

def workingFooBar = ("foo" ~ obar) | ("fo" ~ obar)

But first solution is more striking.