I am trying to get operator precedence correct in a Treesitter grammar. Treesitter is a LR1 parser generator.
I have a straightforward artithmetic grammar, which partially looks like this:
multiply_expression: $ => prec.left(2, seq(
$._expression,
'*',
$._expression,
)),
addition_expression: $ => prec.left(1, seq(
$._expression,
'+',
$._expression,
)),
This works correctly. multiply_expression
indeed gets a higher precedence than addition_expression
.
However, the precedence changes when I add an intermediate rule:
_partial_multi: $ => seq(
$._expression,
'*',
),
multiply_expression: $ => prec.left(2, seq(
$._partial_multi,
$._expression,
)),
I moved $.expression, '*'
to its own rule. To me, this seems to be an equivalent grammar, and I expect no changes. However, with this change the precedence is no longer correct. addition_expression
, which remained unchanged, seems to have a higher precedence than multiply_expression
.
Why does introducing an extra step change the precedence? Is there a name for this problem, or where can I find more information about it? When writing a grammar or fixing precedence problems, are there rules to follow or ways to think about this?
Here is your full grammar, for reproducibility:
You can fix this issue by adding precedence to the
_partial_multi
rule and removing left-associative precedence from themultiply_expression
rule:What you've done here is make multiplication a right-associative operator of precedence 2. This is how you define left- or right-associativity in grammars which don't expose it as a primitive. You can make multiplication left-associative by writing it as follows:
You've actually stumbled across something quite interesting, which is that you don't need explicit language constructs to define precedence & associativity in a grammar! They're just "syntactic sugar" to make the grammar easier to read & write. See this page for more information on how to specify precedence & associativity by decomposing rules. You can see that specifying precedence & associativity purely through grammar constructs is confusing, and almost reads backwards from what you expect unless you think about it! As you also discovered, mixing these two approaches (specifying precedence & associativity through language constructs and grammar constructs) can lead to confusing behavior. It is best to stick with one or the other.