Removing indirect left recursion from this grammar

394 Views Asked by At

I'm trying to figure out how to remove the indirect left recursion from the logical keyword expressions within my Rust port of a Ruby parser (https://github.com/kenaniah/ruby-parser/blob/master/ruby-parser/src/parsers/expression/logical.rs). The grammar looks like:

E --> N | A | O | t
N --> n E
A --> E a E
O --> E o E
E = expression
A = keyword_and_expression
O = keyword_or_expression
N = keyword_not_expression

How would I go about transforming this to remove the recursion in A and O?

2

There are 2 best solutions below

3
On

I think the problem here is not the indirect recursion but rather the ambiguity.

If it were just indirect recursion, you could simply substitute the right-hand sides of N, A and O, eliminating the indirect recursion:

E → n E
  | E a E
  | E o E
  | t

In order to get rid of the direct left recursion, you can left-factor:

E → n E
  | E A'
  | t
A'→ a E
  | o E

and then remove the remaining left-recursion:

E → n E E'
  | t E'
E'→ ε
  | A' E'
A'→ a E
  | o E

That has no left-recursion, direct or indirect, and every rule starts with a unique terminal. However, it's not LL(1), because there's a first/follow conflict.

That's really not surprising, because the original grammar was highly ambiguous, and left-recursion elimination doesn't eliminate ambiguity. The original grammar really only makes sense if accompanied by an operator precedence table.

That table indicates that AND and OR are left-associative operators with the same precedence (a slightly unusual decision), while NOT is a unary operator with higher precedence. That, in turn, means that the BNF should have been written something like this:

N → n N
  | t
E → A
  | O
  | N
A → E a N
O → E o N
N → n N
  | t

The only difference from the grammar in the OP is the removal of ambiguity, as indicated by the precedence table.

Again, the first step is to substitute non-terminals A and O in order to make the left-recursion direct:

E → E a N
  | E o N
  | N
N → n N
  | t

This is essentially the same grammar as the grammar for arithmetic expressions (ignoring multiplication, since there's only one precedence level), and the left-recursion can be eliminated directly:

E  → N E'
E' → a E
   | o E
   | ε
N  → n N
   | t
0
On

According to this factorization tool, the resulting grammar would be:

E  -> N
    | A
    | O
    | t
N  -> n E
A  -> n E a E A'
    | O a E A'
    | t a E A'
O  -> n E o E O'
    | n E a E A' o E O'
    | t a E A' o E O'
    | t o E O'
A' -> a E A'
    | ϵ
O' -> a E A' o E O'
    | o E O'
    | ϵ

Looks like the factorizations for A and O ended up being rather complex thanks to the multiple productions of E.