How to fix? Xtext grammar stops parsing with 'no viable alternative at input ...' on incorrect input

2k Views Asked by At

As an Xtext and Antlr newbie, I'm struggling with getting an error-tolerant Xtext grammar for a very simple subset of a (not JVM related) language I want to parse.

A document in this mini-language could look like this:

$c wff |- $.
$c class $.
$c set $.

So a sequence of statements surrounded by $c and $. keywords, with inbetween one or more words that may not contain $. And everything separated by mandatory whitespace.

The best I can come up with is the following grammar:

grammar mm.ecxt.MMLanguage

import "http://www.eclipse.org/emf/2002/Ecore" as ecore

generate mmLanguage "urn:marnix:mm.exct/MMLanguage"

MMDatabase:
    WS? (statements+=statement WS)* statements+=statement WS?;

statement:
    DOLLAR_C WS (symbols+=MATHSYMBOL WS)+ DOLLAR_DOT;

terminal DOLLAR_C: '$c';
terminal DOLLAR_DOT: '$.';
terminal MATHSYMBOL: 
      ('!'..'#'|'%'..'~')+; /* everything except '$' */

terminal WS : (' '|'\t'|'\r'|'\n')+;

terminal WORD: ('!'..'~')+;

On valid input this grammar works fine. However, on invalid input, like

$c class $.
$c $.
$c set $.
$c x$u $.

there is just one error (no viable alternative at input '$.'), and after that it looks like parsing just stops: no more errors are detected, and the model just contains the correct statements before the error (here only the class statement).

I tried all kinds of variations (using =>, with/without terminal declarations, enabling backtracking, and more) but all I get is no viable alternative at input ....

So my question is: How should I write a grammar for this language so that Antlr does some form of error recovery? Or is there something else that I'm doing wrong?

From, e.g., http://zarnekow.blogspot.de/2012/11/xtext-corner-7-parser-error-recovery.html I expected that this would work out of the box. Or is this because I'm not using a Java/C-like grammar based on Xbase?

1

There are 1 best solutions below

1
Sebastian Zarnekow On BEST ANSWER

What seems to happen here is that in line 2 of your sample input, two tokens are missing according to your grammar: The parser expects a (symbols+=MATHSYMBOL WS)+ but get $.. Antlr will happily try to recover with different strategies, some are working locally and others are working on a per parser rule basis. Antlr will not insert two recovery tokens to finish the rule statement but it'll bail out from there. After the statement, a mandatory WS is expected but it sees $. so it'll bail out again. That's why it appears to not recover at all. Well all of this is more or less an educated guess.

What will help though is a minor grammar refactoring where you do not make the grammar as strict as it currently is. Some optional tokens will help the parser to recover:

MMDatabase:
    WS? (statements+=statement WS?)*;

statement:
    DOLLAR_C WS (symbols+=MATHSYMBOL WS?)* DOLLAR_DOT;