I've implemented two expression parsers, in recursive descent and operator precedence. They're implemented in Object Pascal. Here's the recursive descent:
function ParseFactor: PNode;
var
Temp: PNode;
begin
Result := ParsePrimary;
if t.Kind in [tkDoubleAsterisks] then begin
New(Temp);
Temp^.Kind := nkPower;
Temp^.LeftOperand := Result;
// power operator is right associative
Lexer.Get(t);
Temp^.RightOperand := ParseFactor();
Result := Temp;
end;
end;
function ParseTerm: PNode;
var
Temp: PNode;
begin
Result := ParseFactor;
while t.Kind in [tkAmpersand,tkAsterisk,tkSlash,tkDoubleLessThan,tkDoubleGreaterThan] do begin
New(Temp);
case t.Kind of
tkAmpersand : Temp^.Kind := nkAnd;
tkAsterisk : Temp^.Kind := nkMultiplication;
tkSlash : Temp^.Kind := nkDivision;
tkDoubleLessThan : Temp^.Kind := nkShiftLeft;
tkDoubleGreaterThan: Temp^.Kind := nkShiftRight;
end;
Temp^.LeftOperand := Result;
Lexer.Get(t);
Temp^.RightOperand := ParseFactor;
Result := Temp;
end;
end;
function ParseExpression: PNode;
var
Temp: PNode;
begin
Result := ParseTerm;
while t.Kind in [tkHorzBar,tkCaret,tkPlus,tkMinus] do begin
New(Temp);
case t.Kind of
tkHorzBar: Temp^.Kind := nkOr;
tkCaret : Temp^.Kind := nkXor;
tkPlus : Temp^.Kind := nkAddition;
tkMinus : Temp^.Kind := nkSubstraction;
end;
Temp^.LeftOperand := Result;
Lexer.Get(t);
Temp^.RightOperand := ParseTerm;
Result := Temp;
end;
end;
and here's the operator precedence:
function GetTokenPrecedence(const Kind: TTokenKind): Integer; inline;
begin
case Kind of
tkHorzBar,tkCaret,tkPlus,tkMinus:
Result := 1;
tkAmpersand,tkAsterisk,tkSlash,tkDoubleLessThan,tkDoubleGreaterThan:
Result := 2;
tkDoubleAsterisks:
Result := 3;
else
Result := -1;
end;
end;
function IsRightAssociative(const Kind: TTokenKind): Boolean; inline;
begin
Result := Kind in [tkDoubleAsterisks];
end;
function ParseBinaryRHSExpression(LHS: PNode; const CurrentPrecedence: Integer): PNode;
var
Op: TTokenKind;
RHS: PNode;
begin
while GetTokenPrecedence(t.Kind) >= CurrentPrecedence do begin
Op := t.Kind;
Lexer.Get(t);
RHS := ParsePrimary;
while (GetTokenPrecedence(t.Kind) > GetTokenPrecedence(Op))
or (IsRightAssociative(t.Kind) and (GetTokenPrecedence(t.Kind) >= GetTokenPrecedence(Op)))
do begin
RHS := ParseBinaryRHSExpression(RHS,GetTokenPrecedence(t.Kind));
end;
New(Result);
case Op of
tkHorzBar : Result^.Kind := nkOr;
tkCaret : Result^.Kind := nkXor;
tkPlus : Result^.Kind := nkAddition;
tkMinus : Result^.Kind := nkSubstraction;
tkAmpersand : Result^.Kind := nkAnd;
tkAsterisk : Result^.Kind := nkMultiplication;
tkSlash : Result^.Kind := nkDivision;
tkDoubleLessThan : Result^.Kind := nkShiftLeft;
tkDoubleGreaterThan: Result^.Kind := nkShiftRight;
tkDoubleAsterisks : Result^.Kind := nkPower;
end;
Result^.LeftOperand := LHS;
Result^.RightOperand := RHS;
LHS := Result;
end;
Result := LHS;
end;
function ParseExpression: PNode;
begin
Result := ParsePrimary;
if Assigned(Result) then begin
Result := ParseBinaryRHSExpression(Result,0);
end;
end;
Those are the only essential difference between the two. Some simple tests show that both seem to parse correctly. However, I'm not really sure about the operator precedence implementation because this is the first time I really implement it. And the surprising fact which bothers me, it runs slower than the recursive descent version (it takes 1.5 more times)! My compiler classes and all articles I read states that operator precedence parsing should be more efficient than recursive descent due to fewer function calls and that's what I'm expecting as well since the code seems to show that. I've inline-d additional functions to get the operator precedence and right-associativity testing but this doesn't seem to help much. Please tell me whether I did something wrong. Feel free to ask for clarity of certain things.
As with all things, tradeoffs matter. Recursive descent checks for each operator token explicitly, so if you have N operators, it must essentially do N tests. Operator precedence only has to know that something is an operator token, and use a lookup table. So instead of N tests, it can use just a few lookups. So operator precedence should be faster if you have a lot of operators. If your grammar has only a few, then it isn't clear its a win even if carefully tuned.
In the grand scheme of things, the speed of the parser probably doesn't matter. Whatever tool you are building that uses the parser will pretty much expend more effort someplace other than the parser.
Operator precedence was an interesting idea when machines were small because one could encode complex operator hierarchies in the table. Mostly the tradeoffs it offered don't matter for the typical workstration (or even hand-held). I'd stick to recursive descent for simple grammars, and to parser generators of whatever kind seems convenient for anything else.