Parser implementations comparison: correctness confirmation and (perhaps) optimization

151 Views Asked by At

I've implemented two expression parsers, in recursive descent and operator precedence. They're implemented in Object Pascal. Here's the recursive descent:

function ParseFactor: PNode;
var
  Temp: PNode;
begin
  Result := ParsePrimary;
  if t.Kind in [tkDoubleAsterisks] then begin
    New(Temp);
    Temp^.Kind := nkPower;
    Temp^.LeftOperand := Result;
    // power operator is right associative
    Lexer.Get(t);
    Temp^.RightOperand := ParseFactor();
    Result := Temp;
  end;
end;

function ParseTerm: PNode;
var
  Temp: PNode;
begin
  Result := ParseFactor;
  while t.Kind in [tkAmpersand,tkAsterisk,tkSlash,tkDoubleLessThan,tkDoubleGreaterThan] do begin
    New(Temp);
    case t.Kind of
      tkAmpersand        : Temp^.Kind := nkAnd;
      tkAsterisk         : Temp^.Kind := nkMultiplication;
      tkSlash            : Temp^.Kind := nkDivision;
      tkDoubleLessThan   : Temp^.Kind := nkShiftLeft;
      tkDoubleGreaterThan: Temp^.Kind := nkShiftRight;
    end;
    Temp^.LeftOperand := Result;
    Lexer.Get(t);
    Temp^.RightOperand := ParseFactor;
    Result := Temp;
  end;
end;

function ParseExpression: PNode;
var
  Temp: PNode;
begin
  Result := ParseTerm;
  while t.Kind in [tkHorzBar,tkCaret,tkPlus,tkMinus] do begin
    New(Temp);
    case t.Kind of
      tkHorzBar: Temp^.Kind := nkOr;
      tkCaret  : Temp^.Kind := nkXor;
      tkPlus   : Temp^.Kind := nkAddition;
      tkMinus  : Temp^.Kind := nkSubstraction;
    end;
    Temp^.LeftOperand := Result;
    Lexer.Get(t);
    Temp^.RightOperand := ParseTerm;
    Result := Temp;
  end;
end;

and here's the operator precedence:

function GetTokenPrecedence(const Kind: TTokenKind): Integer; inline;
begin
  case Kind of
    tkHorzBar,tkCaret,tkPlus,tkMinus:
      Result := 1;
    tkAmpersand,tkAsterisk,tkSlash,tkDoubleLessThan,tkDoubleGreaterThan:
      Result := 2;
    tkDoubleAsterisks:
      Result := 3;
    else
      Result := -1;
  end;
end;

function IsRightAssociative(const Kind: TTokenKind): Boolean; inline;
begin
  Result := Kind in [tkDoubleAsterisks];
end;

function ParseBinaryRHSExpression(LHS: PNode; const CurrentPrecedence: Integer): PNode;
var
  Op: TTokenKind;
  RHS: PNode;
begin
  while GetTokenPrecedence(t.Kind) >= CurrentPrecedence do begin
    Op := t.Kind;
    Lexer.Get(t);
    RHS := ParsePrimary;
    while (GetTokenPrecedence(t.Kind) > GetTokenPrecedence(Op))
      or (IsRightAssociative(t.Kind) and (GetTokenPrecedence(t.Kind) >= GetTokenPrecedence(Op)))
    do begin
      RHS := ParseBinaryRHSExpression(RHS,GetTokenPrecedence(t.Kind));
    end;
    New(Result);
    case Op of
      tkHorzBar          : Result^.Kind := nkOr;
      tkCaret            : Result^.Kind := nkXor;
      tkPlus             : Result^.Kind := nkAddition;
      tkMinus            : Result^.Kind := nkSubstraction;
      tkAmpersand        : Result^.Kind := nkAnd;
      tkAsterisk         : Result^.Kind := nkMultiplication;
      tkSlash            : Result^.Kind := nkDivision;
      tkDoubleLessThan   : Result^.Kind := nkShiftLeft;
      tkDoubleGreaterThan: Result^.Kind := nkShiftRight;
      tkDoubleAsterisks  : Result^.Kind := nkPower;
    end;
    Result^.LeftOperand := LHS;
    Result^.RightOperand := RHS;
    LHS := Result;
  end;
  Result := LHS;
end;

function ParseExpression: PNode;
begin
  Result := ParsePrimary;
  if Assigned(Result) then begin
    Result := ParseBinaryRHSExpression(Result,0);
  end;
end;

Those are the only essential difference between the two. Some simple tests show that both seem to parse correctly. However, I'm not really sure about the operator precedence implementation because this is the first time I really implement it. And the surprising fact which bothers me, it runs slower than the recursive descent version (it takes 1.5 more times)! My compiler classes and all articles I read states that operator precedence parsing should be more efficient than recursive descent due to fewer function calls and that's what I'm expecting as well since the code seems to show that. I've inline-d additional functions to get the operator precedence and right-associativity testing but this doesn't seem to help much. Please tell me whether I did something wrong. Feel free to ask for clarity of certain things.

1

There are 1 best solutions below

3
On

As with all things, tradeoffs matter. Recursive descent checks for each operator token explicitly, so if you have N operators, it must essentially do N tests. Operator precedence only has to know that something is an operator token, and use a lookup table. So instead of N tests, it can use just a few lookups. So operator precedence should be faster if you have a lot of operators. If your grammar has only a few, then it isn't clear its a win even if carefully tuned.

In the grand scheme of things, the speed of the parser probably doesn't matter. Whatever tool you are building that uses the parser will pretty much expend more effort someplace other than the parser.

Operator precedence was an interesting idea when machines were small because one could encode complex operator hierarchies in the table. Mostly the tradeoffs it offered don't matter for the typical workstration (or even hand-held). I'd stick to recursive descent for simple grammars, and to parser generators of whatever kind seems convenient for anything else.