LBNF, C function declaration / definition, reduce reduce conflict

250 Views Asked by At

I'm trying to represent in LBNF that C/C++ function declarations have the following (approximate) form (<sym> denotes optionallity, and [rule] is a zero-or-more list):

type ident ( [type <id>] );

While function definitions have the form:

type ident ( [type id] ) { [stmt] }

Currently, I have the following LBNF:

entrypoints Program ;
Program.   Program ::= [TopDef] ;

FnDef.     TopDef ::= Type Ident "(" [Arg] ")" Block ;
FnDecl.    TopDef ::= Type Ident "(" [Par] ")" ";" ;
separator nonempty TopDef "" ;

Param.     Par ::= Type ;
NParam.    Par ::= Type Ident ;
separator Par "," ;

Arg.       Arg ::= Type Ident;
separator  Arg "," ;

-- Types ---------------------------------------------------

Int.       Type ::= "int" ;
--- more types...
separator  Type "," ;

Which causes reduce/reduce conflicts as expected. Is there a way to solve this in the parser / lexer?

I could solve it with the following grammar:

FnDef.     TopDef ::= Type Ident "(" [Arg] ")" Block ;
FnDecl.    TopDef ::= Type Ident "(" [Arg] ")" ";" ;
separator nonempty TopDef "" ;

Arg.       Arg ::= Type Ident;
Arg.       Arg ::= Type;
separator  Arg "," ;

And then check in the type checker that function definitions have identifiers for each argument, but this feels less satisfying...

How is this typically handled in a language like C ?

1

There are 1 best solutions below

1
On BEST ANSWER

The way you mention which you find unsatisfying is, in fact, the way it's usually accomplished.

But you can produce an LALR(1) grammar which is precise. Here's a complete bison specification without conflicts:

%token TYPE ID
%%
prog       : 
           | prog decl ';'
decl       : TYPE ID def_list block
           | TYPE ID def_list ';'
           | TYPE ID dec_list ';'
block      : '{' prog '}'
def_list   : '(' ')'
           | '(' type_ids ')'
dec_list   : '(' type_opt_ids ')'
type_opt_id: type_only
           | type_id
type_ids   : type_id
           | type_ids ',' type_id
type_opt_ids
           : type_only
           | type_ids ',' type_only  /* SEE BELOW */
           | type_opt_ids ',' type_opt_id
type_id    : TYPE ID
type_only  : TYPE        

The key is to avoid forcing the parser to make a decision. As it is passing over the list of parameters, it can keep reducing a type_opt_ids as long as it doesn't hit an anonymous parameter. If it hits one, it reduces a type_ids and continues for the rest of the parameters, whether they are anonymous or not. At the end, the definition only allows type_ids but the declaration (explicitly) accepts either.

To make that work, the semantic type for type_id and type_only would need to be the same, since both type_ids and type_opt_ids must be lists of that type. Otherwise, you would need to convert type_ids to type_opt_ids in the production marked with /* SEE BELOW */

(I apologize for not converting that to your formalism, but I wanted to test it with bison to verify that it is, in fact, conflict-free. It should be easy enough to convert.)


Note that C++ is happy to allow function definitions without parameter names, but C is not. On the other hand, C allows function definitions without types, or with the type declarations between the parameter-name list and the body. But that's a side issue, really.