How to correctly handle misplaced keywords in FParsec preserving FParsec's inbuit error messages and positions?

59 Views Asked by At

This is kind of duplicate to FParsec identifiers vs keywords, but the answer there isn't helpful in my case.

A common problem in writing parsers for programming languages is handling syntax errors that occur when keywords are misplaced in the code.

I have seen two approaches to this problem in FParsec, but none is really satisfying.

Approach 1: Write a grammar that allows choices between keywords and other literals where a conflict is possible (this is the accepted solution of the above-mentioned duplicate).

In my grammar, there are contexts where keywords never occur. So, rewriting my grammar to allow choices between literals and keywords in these contexts is not really an option. Otherwise, I would allow my parser to accept input that is actually a syntax error.

Approach 2: Identify literals that might conflict with keywords and let their parsers fail if they consume input that was a keyword.

There are two negative side effects of this approach I want to demonstrate in the following example:


open FParsec

type Ast = 
    | Var of string
    | Number of string
    | Assignment of Ast * Ast

let assign = skipString ":=" >>. spaces

let number = regex "\d+" <?> "number" |>> Ast.Number


let var = 
    regex "[a-z]\w*" 
    .>> spaces 
    <?> "variable" 
    >>= (fun s -> 
        if List.contains s ["for"; "if"; "else"] then // check if s is a keyword
            fail "variable expected" // fail if s is a keyword
        else
            preturn (Ast.Var s) // return Ast.Var s otherwise
        )


let varOrNumber = choice [ var ; number ] 

let assignment = (var .>> assign) .>>. varOrNumber |>> Ast.Assignment

let parser = assignment


let p1 = run parser "x := y"
printfn "%O" p1

let p2 = run parser "x := 42"
printfn "%O" p2

printfn "---------"
let p3 = run parser "# := y"
printfn "failure p3 (correct message and position)\n%O" p3

printfn "---------"
let p4 = run parser "for := y"
printfn "failure p4 (wrong message and position)\n%O" p4

printfn "---------"
let p5 = run parser "x := #"
printfn "failure p5 (correct message and position)\n%O" p5

printfn "---------"
let p6 = run parser "x := for"
printfn "failure p6 (wrong message and position)\n%O" p6

This will output

Success: Assignment (Var "x", Var "y")
Success: Assignment (Var "x", Number "42")
---------
failure p3 (correct message and position)
Failure:
Error in Ln: 1 Col: 1
# := y
^
Expecting: variable

---------
failure p4 (wrong message and position)
Failure:
Error in Ln: 1 Col: 5
for := y
    ^
variable expected

---------
failure p5 (correct message and position)
Failure:
Error in Ln: 1 Col: 6
x := #
     ^
Expecting: number or variable

---------
failure p6 (wrong message and position)
Failure:
Error in Ln: 1 Col: 9
x := for
        ^
Note: The error occurred at the end of the input stream.
variable expected

In this example, the parser var has been modified to fail if the consumed input was some keyword.

Note the disadvantages of this approach:

  1. It is impossible to create a custom error message that would match every possible error context of the grammar: In the simple examples p3 and p4, the grammar expects a variable, while in p5 and p6, the grammar expects either a variable or a number. While p3 and p5 output the correct FParsec error messages, p4 and p6 output the custom error message that is (only accidentally) correct in the context of p4.
  2. The lambda function fun s failing if s is a keyword consumes the input of the keyword so the position of the error shifts to the first character after the misplaced keyword. This position has nothing to do with the error because, actually, the error should occur before consuming the keyword, not thereafter. In my opinion, an erroneous input should never be consumed by a parser, even if it fails.

Is there a way to write an FParsec parser that would correctly identify misplaced keywords in the input and still preserve the original FPArsec error messages and positions?

1

There are 1 best solutions below

2
On BEST ANSWER

The FParsec documentation describes a handy parser called resultSatisfies that backtracks the way you want:

let resultSatisfies predicate msg (p: Parser<_,_>) : Parser<_,_> =
    let error = messageError msg
    fun stream ->
      let state = stream.State
      let reply = p stream
      if reply.Status <> Ok || predicate reply.Result then reply
      else
          stream.BacktrackTo(state) // backtrack to beginning
          Reply(Error, error)

In your case, I think you could use it like this:

let var = 
    regex "[a-z]\w*" 
    .>> spaces 
    <?> "variable"
    |> resultSatisfies (fun s ->
        List.contains s ["for"; "if"; "else"] |> not) "variable expected"
    |>> Ast.Var

I haven't tried to vary the error message by context. I think that's something you would have to work on yourself.

Output is:

Success: Assignment (Var "x", Var "y")
Success: Assignment (Var "x", Number "42")
---------
failure p3 (correct message and position)
Failure:
Error in Ln: 1 Col: 1
# := y
^
Expecting: variable

---------
failure p4 (wrong message and position)
Failure:
Error in Ln: 1 Col: 1
for := y
^
variable expected

---------
failure p5 (correct message and position)
Failure:
Error in Ln: 1 Col: 6
x := #
     ^
Expecting: number or variable

---------
failure p6 (wrong message and position)
Failure:
Error in Ln: 1 Col: 6
x := for
     ^
Expecting: number
Other error messages:
  variable expected