LBNF, C function declaration / definition, reduce reduce conflict

281 Views Asked by At

I'm trying to represent in LBNF that C/C++ function declarations have the following (approximate) form (<sym> denotes optionallity, and [rule] is a zero-or-more list):

type ident ( [type <id>] );

While function definitions have the form:

type ident ( [type id] ) { [stmt] }

Currently, I have the following LBNF:

entrypoints Program ;
Program.   Program ::= [TopDef] ;

FnDef.     TopDef ::= Type Ident "(" [Arg] ")" Block ;
FnDecl.    TopDef ::= Type Ident "(" [Par] ")" ";" ;
separator nonempty TopDef "" ;

Param.     Par ::= Type ;
NParam.    Par ::= Type Ident ;
separator Par "," ;

Arg.       Arg ::= Type Ident;
separator  Arg "," ;

-- Types ---------------------------------------------------

Int.       Type ::= "int" ;
--- more types...
separator  Type "," ;

Which causes reduce/reduce conflicts as expected. Is there a way to solve this in the parser / lexer?

I could solve it with the following grammar:

FnDef.     TopDef ::= Type Ident "(" [Arg] ")" Block ;
FnDecl.    TopDef ::= Type Ident "(" [Arg] ")" ";" ;
separator nonempty TopDef "" ;

Arg.       Arg ::= Type Ident;
Arg.       Arg ::= Type;
separator  Arg "," ;

And then check in the type checker that function definitions have identifiers for each argument, but this feels less satisfying...

How is this typically handled in a language like C ?

1

There are 1 best solutions below

1
rici On BEST ANSWER

The way you mention which you find unsatisfying is, in fact, the way it's usually accomplished.

But you can produce an LALR(1) grammar which is precise. Here's a complete bison specification without conflicts:

%token TYPE ID
%%
prog       : 
           | prog decl ';'
decl       : TYPE ID def_list block
           | TYPE ID def_list ';'
           | TYPE ID dec_list ';'
block      : '{' prog '}'
def_list   : '(' ')'
           | '(' type_ids ')'
dec_list   : '(' type_opt_ids ')'
type_opt_id: type_only
           | type_id
type_ids   : type_id
           | type_ids ',' type_id
type_opt_ids
           : type_only
           | type_ids ',' type_only  /* SEE BELOW */
           | type_opt_ids ',' type_opt_id
type_id    : TYPE ID
type_only  : TYPE        

The key is to avoid forcing the parser to make a decision. As it is passing over the list of parameters, it can keep reducing a type_opt_ids as long as it doesn't hit an anonymous parameter. If it hits one, it reduces a type_ids and continues for the rest of the parameters, whether they are anonymous or not. At the end, the definition only allows type_ids but the declaration (explicitly) accepts either.

To make that work, the semantic type for type_id and type_only would need to be the same, since both type_ids and type_opt_ids must be lists of that type. Otherwise, you would need to convert type_ids to type_opt_ids in the production marked with /* SEE BELOW */

(I apologize for not converting that to your formalism, but I wanted to test it with bison to verify that it is, in fact, conflict-free. It should be easy enough to convert.)


Note that C++ is happy to allow function definitions without parameter names, but C is not. On the other hand, C allows function definitions without types, or with the type declarations between the parameter-name list and the body. But that's a side issue, really.