Why parsing this program with BNFC fails?

Question

Why parsing this program with BNFC fails?

723 Views Asked by kitek At 17 May 2015 at 01:50

Given following grammar:

comment "/*" "*/" ;

TInt.  Type1 ::= "int" ;
TBool. Type1 ::= "bool" ;
coercions Type 1 ;


BTrue.   BExp   ::= "true" ;
BFalse.  BExp   ::= "false" ;

EOr.     Exp    ::= Exp  "||" Exp1 ;
EAnd.    Exp1   ::= Exp1 "&&" Exp2 ;
EEq.     Exp2   ::= Exp2 "==" Exp3 ;
ENeq.    Exp2   ::= Exp2 "!=" Exp3 ;
ELt.     Exp3   ::= Exp3 "<" Exp4 ;
EGt.     Exp3   ::= Exp3 ">" Exp4 ;
ELte.    Exp3   ::= Exp3 "<=" Exp4 ;
EGte.    Exp3   ::= Exp3 ">=" Exp4 ;
EAdd.    Exp4   ::= Exp4 "+" Exp5 ;
ESub.    Exp4   ::= Exp4 "-" Exp5 ;
EMul.    Exp5   ::= Exp5 "*" Exp6 ;
EDiv.    Exp5   ::= Exp5 "/" Exp6 ;
EMod.    Exp5   ::= Exp5 "%" Exp6 ;
ENot.    Exp6   ::= "!" Exp ;
EVar.    Exp8   ::= Ident ;
EInt.    Exp8   ::= Integer ;
EBool.   Exp8   ::= BExp ;
EIver.   Exp8   ::= "[" Exp "]" ;
coercions Exp 8 ;

Decl. Decl ::= Ident ":" Type ;
terminator Decl ";" ;


LIdent.  Lvalue ::= Ident ;

SBlock.  Stm ::= "{" [Decl] [Stm] "}" ;
SExp.    Stm ::= Exp ";" ;
SWhile.  Stm ::= "while" "(" Exp ")" Stm ;
SReturn. Stm ::= "return" Exp ";" ;
SAssign. Stm ::= Lvalue "=" Exp ";" ;
SPrint.  Stm ::= "print" Exp ";" ;
SIf.     Stm ::= "if" "(" Exp ")" "then" Stm "endif" ;
SIfElse. Stm ::= "if" "(" Exp ")" "then" Stm "else" Stm "endif" ;

terminator Stm "" ;

entrypoints Stm;

parser created with bnfc fails to parse

{ c = a; }

although it parses

c = a;

or

{ print a; c = a; }

I think it could be a problem that parser sees Ident and doesn't know whether it's declaration or statement, LR stuff etc (still one token of lookeahed should be enough??). However I couldn't find any note in BNFC documentation that would say that it doesn't work for all grammars.

Any ideas how to get this working?

Original Q&A

There are 1 best solutions below

**rici** · Accepted Answer · 2015-05-17T05:31:14.703000

I would think you would get a shift/reduce conflict report for that grammar, although where that error message shows up might well depend on which tool BNFC is using to generate the parser. As far as I know, all the backend tools have the same approach to dealing with shift/reduce conflicts, which is to (1) warn the user about the conflict, and then (2) resolve the conflict in favour of shifting.

The problematic production is this one: (I've left out type annotations to reduce clutter)

Stm ::= "{" [Decl] [Stm] "}" ;

Here, [Decl] and [Stm] are macros, which automatically produce definitions for the non-terminals with those names (or something equivalent which will be accepted by the backend tool). Specifically, the automatically-produced productions are:

[Decl] ::= /* empty */
       |   Decl ';' [Decl]

[Stm]  ::= /* empty */
       |   Stm [Stm]

(The ; in the first rule is the result of a "terminator" declaration. I don't know why BNFC generates right-recursive rules, but that's how I interpret the reference manual -- after a very quick glance -- and I'm sure they have their reasons. For the purpose of this problem, it doesn't matter.

What's important is that both Decl and Stm can start with an Ident. So let's suppose we're parsing { id ..., which might be { id : ... or { id = ..., but we've only read the { and the lookahead token id. So there are two possibilities:

id is the start of a Decl. We should shift the Ident and go to the state which includes Decl → Ident • ':' Type
id is the start of a Stm. In this case, we need to reduce the production [Decl] → • before we shift Ident into a Stm production.

So we have a shift/reduce conflict, because we cannot see the second next token (either : or =). And, as mentioned above, shift usually wins in this case, so the LR(1) parser will commit itself to expect a Decl. Consequently, { a = b ; } will fail.

An LR(2) parser generator would do fine with this grammar, but those are much harder to find. (Modern bison can produce GLR parsers, which are even more powerful than LR(2) at the cost of a bit of extra compute time, but not the version required by the BNFC tool.)

Possible solutions

Allow declarations to be intermingled with statements. This one is my preference. It is simple, and many programmers expect to be able to declare a variable at first use rather than at the beginning of the enclosing block.
Make the declaration recognizable from the first token, either by putting the type first (as in C) or by adding a keyword such as var (as in Javascript):
Modify the grammar to defer the lookahead. It is always possible to find an LR(1) grammar for any LR(k) language (provided k is finite), but it can be tedious. An ugly but effective alternative is to continue the lexical scan until either a : or some other non-whitespace character is found, so that id : gets tokenized as IdentDefine or some such. (This is the solution used by bison, as it happens. It means that you can't put comments between an identifier and the following :, but there are few, if any, good reasons to put a comment in that context.

Why parsing this program with BNFC fails?

There are 1 best solutions below

Possible solutions

Related Questions in PARSING

Related Questions in BNF

Related Questions in BNFC

Trending Questions

Popular # Hahtags

Popular Questions