Performance of compiler decreases drastically when using backtrack option

156 Views Asked by At

I've implemented a Transpiler for a C-like language using the C# version of ANTLR3 on Windows7.

To parse if/else statements I use the following rule:

ifStatement
: 'if' '(' expression ')' b1=block
   (
       'else' b2=block      -> ^('if' expression $b1 $b2 'else')
   |   'else' ifStatement   -> ^('if' expression $b1 ^(ELSIF ifStatement))
   |                        -> ^('if' expression $b1)
   )
;

To translate the tree generated by this rule from my own language into C# I use the following grammar snippet:

ifStatement
options { backtrack = true; }
: 
     ^(n='if' expression b1=block)
     -> if(
          node={$n},
          cond={$expression.st},
          block1={$b1.st},
          block2={null},
          isElsif={($n.Parent.Text == "ELSIF") ? "true" : null},
          node2={null}
        )
|
     ^(n='if' expression b1=block b2=block n2='else')
     -> if(
          node={$n},
          cond={$expression.st},
          block1={$b1.st},
          block2={$b2.st},
          isElsif={($n.Parent.Text == "ELSIF") ? "true" : null},
          node2={$n2}
        )
|
     ^(n='if' expression b1=block b2=ifStatement)
     -> elsif(
          node={$n},
          cond={$expression.st},
          block1={$b1.st},
          block2={$b2.st},
          isElsif={($n.Parent.Text == "ELSIF") ? "true" : null}
        )
|
     ^(ELSIF i=ifStatement) -> { $i.st }
;

This works fine in most cases, for example it translates the following code without problems:

if (x == "1") {
}
else if (x == "2") {
}
else if (x == "3") {
}

But when I have more than 20 "else if" clauses, the compiler needs several minutes to do its work. The time needed for compilation does not increase linearly, that is the compiler returns immediately for 17 or 18 "else if" clauses.

Update:

I have fixed the issue. I've replaced

     ^(n='if' expression b1=block b2=ifStatement)
     -> elsif(
          node={$n},
          cond={$expression.st},
          block1={$b1.st},
          block2={$b2.st},
          isElsif={($n.Parent.Text == "ELSIF") ? "true" : null}
        )
|
     ^(ELSIF i=ifStatement) -> { $i.st }

with

     ^(n='if' expression b1=block ^(ELSIF b2=ifStatement))
     -> elsif(
          node={$n},
          cond={$expression.st},
          block1={$b1.st},
          block2={$b2.st},
          isElsif={($n.Parent.Text == "ELSIF") ? "true" : null}
        )

That is I've merged the last two alternatives into a single one.

Now the Transpiler is blazingly fast and still returns correct results.

I'd really like to know why this change makes such a big difference.

3

There are 3 best solutions below

3
On BEST ANSWER

Welcome to backtracking.

If you force the parser to choose between (for example) three choices (as you have in your grammar), and it has to backtrack, and the choices nest [as your if-then-else does], then a nested set of N constructs can require 3^N units of work to parse. 3^20 is 27 times as long as 3^17.

Lesson: backtracking is useful sometimes, but generally you should avoid it.

For your grammar, why not treat the if construct just like every other statement? Then it doesn't show up as a special case, and you can avoid the backtracking altogether. You'll even get the standard "else attaches to the closest if" rule that is standard in most programming langauges.

2
On

Try to rewrite your grammar without backtracking on the entire rule. If you can't, then at the very least add memoize=true to it to speed things up:

options { backtrack = true; memoize=true; }
1
On

Not sure why you need backtracking for this rule at all. Can block also be an ifStatement? If so add a semantic predicate for the else ifStatement part to check if the next token is actually IF. Otherwise you can always take the block alternative.

Also, avoid tree rewrites if you can. I found that creating and rewriting an AST can double the parse time.