I am trying to parse positive and negative decimals.
number(N) ::= pnumber(N1).
number(N) ::= nnumber(N1).
number(N) ::= pnumber(N1) DOT pnumber(N2).
number(N) ::= nnumber(N1) DOT pnumber(N2).
pnumber(N) ::= NUMBER(N1).
nnumber(N) ::= MINUS NUMBER(N1).
The inclusion of the first two rules gives a shift/reduce conflict but I don't know how I can write the grammar such that the conflict never occurs. I am using the Lemon parser.
Edit: conflicts from .out file
State 79:
(56) number ::= nnumber *
number ::= nnumber * DOT pnumber
DOT shift 39
DOT reduce 56 ** Parsing conflict **
{default} reduce 56 number ::= nnumber
State 80:
(55) number ::= pnumber *
number ::= pnumber * DOT pnumber
DOT shift 40
DOT reduce 55 ** Parsing conflict **
{default} reduce 55 number ::= pnumber
State 39:
number ::= nnumber DOT * pnumber
pnumber ::= * NUMBER
NUMBER shift-reduce 59 pnumber ::= NUMBER
pnumber shift-reduce 58 number ::= nnumber DOT pnumber
State 40:
number ::= pnumber DOT * pnumber
pnumber ::= * NUMBER
NUMBER shift-reduce 59 pnumber ::= NUMBER
pnumber shift-reduce 57 number ::= pnumber DOT pnumber
Edit 2: Minimal grammar that causes issue
start ::= prog.
prog ::= rule.
rule ::= REVERSE_IMPLICATION body DOT.
body ::= bodydef.
body ::= body CONJUNCTION bodydef.
bodydef ::= literal.
literal ::= variable.
variable ::= number.
number ::= pnumber.
number ::= nnumber.
number ::= pnumber DOT pnumber.
number ::= nnumber DOT pnumber.
pnumber ::= NUMBER.
nnumber ::= MINUS NUMBER.
The conflicts you show indicate a problem with how the
number
non-terminal is used, not withnumber
itself.The basic problem is that after seeing a
pnumber
ornnumber
, when the next token of lookahead is aDOT
, it can't decide if that should be the end of thenumber
(reduce, soDOT
is part of some other non-terminal after thenumber
), or if theDOT
should be treated as part of thenumber
(shifted so it can later reduce one of the p/nnumber DOT pnumber rules.)So in order to diagnose the problem, you'll need to show all the rules that use
number
anywhere on the right hand side (and recursively any other rules that use any of those rules' non-terminals on the right).Note that it is rarely useful to post just a fragment of a grammar, as the LR parser construction process depends heavily on the context of where the rules are used elsewhere in the grammar...
So the problem here is that you need two-token lookahead to differentiate between a
DOT
in a (real)number
literal
and aDOT
at the end of arule
.The easy fix is to let the lexer deal with it -- lexers can do small amounts of lookahead quite easily, so you can recognize
REAL_NUMBER
as a distinct non-terminal fromNUMBER
(probably still without the-
, so you'd end up withIt's much harder to remove the conflict by factoring the grammar but it can be done.
In general, to refactor a grammar to remove a lookahead conflict, you need to figure out the rules that manifest the conflict (
rule
andnumber
here) and refactor things to bring them together into rules that have common prefixes until you get far enough along to disambiguate.First, I'm going to assume there are other rules besides
number
that can appear here, as otherwise we could just eliminate all the intervening rules.We want to move the
number
rule "up" in the grammar to get it into the same place asrule
withDOT
. So we need to split the containing rules to special case when they end with anumber
. We add a suffix to denote the rules that correspond to the original rule with all versions that end in anumber
split off...and propagate that "up"
...and again
...and again
Notice that as you move it up, you need to split up more and more rules, so this process can blow up the grammar quite a bit. However, rules that are used only at the end of a rhs that you're refactoring will end up only needing the
_n
version, so you don't necessarily have to double the number of rules....last step
Now you have the DOTs in all the same places, so expand the
number
rules:and the shift-reduce conflicts are gone, because the rules have common prefixes up until past the needed lookahead to determine which to use. I've reduced the number of rules in this final expansion by adding