I wanted to make a reader which reads configuration files similar to INI files for mswin. It is for exercise to teach myself using a lexer/parser generator which I made. The grammar is:
%lexer
HEADER ::= "\\[[0-9a-zA-Z]+\\]"
TRUE ::= "yes|true"
FALSE ::= "no|false"
ASSIGN ::= "="
OPTION_NAME ::= "[a-zA-Z][0-9a-zA-Z]*"
INT ::= "[0-9]+"
STRING ::= "\"(\\\"|[^\"])*\""
CODE ::= "<{(.*)}>"
BLANK ::= "[ \t\f]+" :ignore
COMMENT ::= "#[^\n\r]*(\r|\n)?" :ignore
NEWLINE ::= "\r|\n"
%parser
Options ::= OptionGroup Options | OptionGroup | @epsilon@
OptionGroup ::= HEADER NEWLINE OptionsList
OptionsList ::= Option NEWLINE OptionsList | Option
Option ::= OPTION_NAME ASSIGN OptionValue
OptionValue ::= TRUE | FALSE | INT | STRING | CODE
The problem lies in the @epsilon@
production. I added it because I want my reader to accept also empty files. But I'm getting conflicts when 'OptionsList' or 'OptionGroup' contains an epsilon production. I tried rearrange elements in productions, but I'm only getting conflicts (r/r or s/r, depending of what I did), unless I remove the epsilon completely from my grammar. It removes the problem, but...in my logic one of 'OptionsList' or 'OptionGroup' should contain an epsilon, otherwise my goal to accepting empty files is not met.
My parser generator uses LR(1) method, so I thought I can use epsilon productions in my grammar. It seems I'm good at writing generators, but not in constructing error-less grammars :(.
Should I forget about epsilons? Or is my grammar accepting empty inputs even when there is no epsilon production?
Your
Options
production allows anOptions
to be a sequence ofOptionGroup
s, starting with either an empty list or a list consisting of a single element. That's obviously ambiguous, because a list of exactly oneOptionGroup
could be:OptionGroup
@epsilon@
with the addition of anOptionGroup
.In short, instead of
you need
which matches exactly the same set of sentences, but unambiguously.
In general terms, you are usually better off writing left-recursive rules for bottom-up parsers. So I would have written