Parser implementations comparison: correctness confirmation and (perhaps) optimization

Question

Parser implementations comparison: correctness confirmation and (perhaps) optimization

154 Views Asked by LeleDumbo At 15 June 2025 at 17:34

I've implemented two expression parsers, in recursive descent and operator precedence. They're implemented in Object Pascal. Here's the recursive descent:

function ParseFactor: PNode;
var
  Temp: PNode;
begin
  Result := ParsePrimary;
  if t.Kind in [tkDoubleAsterisks] then begin
    New(Temp);
    Temp^.Kind := nkPower;
    Temp^.LeftOperand := Result;
    // power operator is right associative
    Lexer.Get(t);
    Temp^.RightOperand := ParseFactor();
    Result := Temp;
  end;
end;

function ParseTerm: PNode;
var
  Temp: PNode;
begin
  Result := ParseFactor;
  while t.Kind in [tkAmpersand,tkAsterisk,tkSlash,tkDoubleLessThan,tkDoubleGreaterThan] do begin
    New(Temp);
    case t.Kind of
      tkAmpersand        : Temp^.Kind := nkAnd;
      tkAsterisk         : Temp^.Kind := nkMultiplication;
      tkSlash            : Temp^.Kind := nkDivision;
      tkDoubleLessThan   : Temp^.Kind := nkShiftLeft;
      tkDoubleGreaterThan: Temp^.Kind := nkShiftRight;
    end;
    Temp^.LeftOperand := Result;
    Lexer.Get(t);
    Temp^.RightOperand := ParseFactor;
    Result := Temp;
  end;
end;

function ParseExpression: PNode;
var
  Temp: PNode;
begin
  Result := ParseTerm;
  while t.Kind in [tkHorzBar,tkCaret,tkPlus,tkMinus] do begin
    New(Temp);
    case t.Kind of
      tkHorzBar: Temp^.Kind := nkOr;
      tkCaret  : Temp^.Kind := nkXor;
      tkPlus   : Temp^.Kind := nkAddition;
      tkMinus  : Temp^.Kind := nkSubstraction;
    end;
    Temp^.LeftOperand := Result;
    Lexer.Get(t);
    Temp^.RightOperand := ParseTerm;
    Result := Temp;
  end;
end;

and here's the operator precedence:

function GetTokenPrecedence(const Kind: TTokenKind): Integer; inline;
begin
  case Kind of
    tkHorzBar,tkCaret,tkPlus,tkMinus:
      Result := 1;
    tkAmpersand,tkAsterisk,tkSlash,tkDoubleLessThan,tkDoubleGreaterThan:
      Result := 2;
    tkDoubleAsterisks:
      Result := 3;
    else
      Result := -1;
  end;
end;

function IsRightAssociative(const Kind: TTokenKind): Boolean; inline;
begin
  Result := Kind in [tkDoubleAsterisks];
end;

function ParseBinaryRHSExpression(LHS: PNode; const CurrentPrecedence: Integer): PNode;
var
  Op: TTokenKind;
  RHS: PNode;
begin
  while GetTokenPrecedence(t.Kind) >= CurrentPrecedence do begin
    Op := t.Kind;
    Lexer.Get(t);
    RHS := ParsePrimary;
    while (GetTokenPrecedence(t.Kind) > GetTokenPrecedence(Op))
      or (IsRightAssociative(t.Kind) and (GetTokenPrecedence(t.Kind) >= GetTokenPrecedence(Op)))
    do begin
      RHS := ParseBinaryRHSExpression(RHS,GetTokenPrecedence(t.Kind));
    end;
    New(Result);
    case Op of
      tkHorzBar          : Result^.Kind := nkOr;
      tkCaret            : Result^.Kind := nkXor;
      tkPlus             : Result^.Kind := nkAddition;
      tkMinus            : Result^.Kind := nkSubstraction;
      tkAmpersand        : Result^.Kind := nkAnd;
      tkAsterisk         : Result^.Kind := nkMultiplication;
      tkSlash            : Result^.Kind := nkDivision;
      tkDoubleLessThan   : Result^.Kind := nkShiftLeft;
      tkDoubleGreaterThan: Result^.Kind := nkShiftRight;
      tkDoubleAsterisks  : Result^.Kind := nkPower;
    end;
    Result^.LeftOperand := LHS;
    Result^.RightOperand := RHS;
    LHS := Result;
  end;
  Result := LHS;
end;

function ParseExpression: PNode;
begin
  Result := ParsePrimary;
  if Assigned(Result) then begin
    Result := ParseBinaryRHSExpression(Result,0);
  end;
end;

Those are the only essential difference between the two. Some simple tests show that both seem to parse correctly. However, I'm not really sure about the operator precedence implementation because this is the first time I really implement it. And the surprising fact which bothers me, it runs slower than the recursive descent version (it takes 1.5 more times)! My compiler classes and all articles I read states that operator precedence parsing should be more efficient than recursive descent due to fewer function calls and that's what I'm expecting as well since the code seems to show that. I've inline-d additional functions to get the operator precedence and right-associativity testing but this doesn't seem to help much. Please tell me whether I did something wrong. Feel free to ask for clarity of certain things.

Original Q&A

There are 1 best solutions below

**Ira Baxter** · Answer 1

As with all things, tradeoffs matter. Recursive descent checks for each operator token explicitly, so if you have N operators, it must essentially do N tests. Operator precedence only has to know that something is an operator token, and use a lookup table. So instead of N tests, it can use just a few lookups. So operator precedence should be faster if you have a lot of operators. If your grammar has only a few, then it isn't clear its a win even if carefully tuned.

In the grand scheme of things, the speed of the parser probably doesn't matter. Whatever tool you are building that uses the parser will pretty much expend more effort someplace other than the parser.

Operator precedence was an interesting idea when machines were small because one could encode complex operator hierarchies in the table. Mostly the tradeoffs it offered don't matter for the typical workstration (or even hand-held). I'd stick to recursive descent for simple grammars, and to parser generators of whatever kind seems convenient for anything else.

Parser implementations comparison: correctness confirmation and (perhaps) optimization

There are 1 best solutions below

Related Questions in PARSING

Related Questions in COMPARISON

Related Questions in OPERATOR-PRECEDENCE

Related Questions in RECURSIVE-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions