I'm trying to represent in LBNF that C/C++ function declarations have the following (approximate) form (<sym>
denotes optionallity, and [rule] is a zero-or-more list):
type ident ( [type <id>] );
While function definitions have the form:
type ident ( [type id] ) { [stmt] }
Currently, I have the following LBNF:
entrypoints Program ;
Program. Program ::= [TopDef] ;
FnDef. TopDef ::= Type Ident "(" [Arg] ")" Block ;
FnDecl. TopDef ::= Type Ident "(" [Par] ")" ";" ;
separator nonempty TopDef "" ;
Param. Par ::= Type ;
NParam. Par ::= Type Ident ;
separator Par "," ;
Arg. Arg ::= Type Ident;
separator Arg "," ;
-- Types ---------------------------------------------------
Int. Type ::= "int" ;
--- more types...
separator Type "," ;
Which causes reduce/reduce conflicts as expected. Is there a way to solve this in the parser / lexer?
I could solve it with the following grammar:
FnDef. TopDef ::= Type Ident "(" [Arg] ")" Block ;
FnDecl. TopDef ::= Type Ident "(" [Arg] ")" ";" ;
separator nonempty TopDef "" ;
Arg. Arg ::= Type Ident;
Arg. Arg ::= Type;
separator Arg "," ;
And then check in the type checker that function definitions have identifiers for each argument, but this feels less satisfying...
How is this typically handled in a language like C ?
The way you mention which you find unsatisfying is, in fact, the way it's usually accomplished.
But you can produce an LALR(1) grammar which is precise. Here's a complete bison specification without conflicts:
The key is to avoid forcing the parser to make a decision. As it is passing over the list of parameters, it can keep reducing a
type_opt_ids
as long as it doesn't hit an anonymous parameter. If it hits one, it reduces atype_ids
and continues for the rest of the parameters, whether they are anonymous or not. At the end, the definition only allowstype_ids
but the declaration (explicitly) accepts either.To make that work, the semantic type for
type_id
andtype_only
would need to be the same, since bothtype_ids
andtype_opt_ids
must be lists of that type. Otherwise, you would need to converttype_ids
totype_opt_ids
in the production marked with/* SEE BELOW */
(I apologize for not converting that to your formalism, but I wanted to test it with bison to verify that it is, in fact, conflict-free. It should be easy enough to convert.)
Note that C++ is happy to allow function definitions without parameter names, but C is not. On the other hand, C allows function definitions without types, or with the type declarations between the parameter-name list and the body. But that's a side issue, really.