LR(1) Parser: Why adding an epsilon production makes r/r or s/r conflicts

265 Views Asked by At

I wanted to make a reader which reads configuration files similar to INI files for mswin. It is for exercise to teach myself using a lexer/parser generator which I made. The grammar is:

%lexer
HEADER ::= "\\[[0-9a-zA-Z]+\\]"
TRUE ::= "yes|true"
FALSE ::= "no|false"
ASSIGN ::= "="
OPTION_NAME ::= "[a-zA-Z][0-9a-zA-Z]*"
INT ::= "[0-9]+"
STRING ::= "\"(\\\"|[^\"])*\""
CODE ::= "<{(.*)}>"
BLANK ::= "[ \t\f]+" :ignore
COMMENT ::= "#[^\n\r]*(\r|\n)?" :ignore
NEWLINE ::= "\r|\n"
%parser
Options ::= OptionGroup Options | OptionGroup | @epsilon@
OptionGroup ::= HEADER NEWLINE OptionsList 
OptionsList ::= Option NEWLINE OptionsList | Option 
Option ::= OPTION_NAME ASSIGN OptionValue
OptionValue ::= TRUE | FALSE | INT | STRING | CODE 

The problem lies in the @epsilon@ production. I added it because I want my reader to accept also empty files. But I'm getting conflicts when 'OptionsList' or 'OptionGroup' contains an epsilon production. I tried rearrange elements in productions, but I'm only getting conflicts (r/r or s/r, depending of what I did), unless I remove the epsilon completely from my grammar. It removes the problem, but...in my logic one of 'OptionsList' or 'OptionGroup' should contain an epsilon, otherwise my goal to accepting empty files is not met.

My parser generator uses LR(1) method, so I thought I can use epsilon productions in my grammar. It seems I'm good at writing generators, but not in constructing error-less grammars :(.

Should I forget about epsilons? Or is my grammar accepting empty inputs even when there is no epsilon production?

1

There are 1 best solutions below

12
On BEST ANSWER

Your Options production allows an Options to be a sequence of OptionGroups, starting with either an empty list or a list consisting of a single element. That's obviously ambiguous, because a list of exactly one OptionGroup could be:

  • The base case OptionGroup
  • The base case @epsilon@ with the addition of an OptionGroup.

In short, instead of

Options ::= OptionGroup Options | OptionGroup | @epsilon@

you need

Options ::= OptionGroup Options | @epsilon@

which matches exactly the same set of sentences, but unambiguously.

In general terms, you are usually better off writing left-recursive rules for bottom-up parsers. So I would have written

Options ::= Options OptionGroup | @epsilon@