I am creating a parser with Lark. The parser works fine for most of the tests I ran, but failed with the define keyword. It only works if it is followed by an assignement. define a = 10 works just fine, but define b is not treated as a define statement.
Here is the Lark parser :
import lark
# ...
parser = lark.Lark("""
?start: statements
?statements: ((expr (";" | NEWLINE) | NEWLINE ) )* expr?
?expr: identifier | number | functioncall | define | assignment | function
?functioncall: identifier "(" arguments? ")"
?arguments: expr ("," expr)*
?define: "define" identifier ("=" expr)?
?assignment: identifier "=" expr
?function: "function" "(" parameters? ")" "->" identifier block
?parameters: identifier ("," identifier)*
?block: "{" statements "}"
?identifier: NAME -> identifier
?number: NUMBER -> number
%import common.NEWLINE
%import common.CNAME -> NAME
%import common.NUMBER
%import common.WS_INLINE
%ignore WS_INLINE
COMMENT: "/*" /(.|\n)+/x "*/" | "//" /.+/ NEWLINE?
%ignore COMMENT
""")
My tests :
tree = parser.parse("define a = 10")
assert(tree.data == "define") # OK
tree = parser.parse("define b")
assert(tree.data == "define") # NOT OK - tree.data is "identifier"
Specifically, parser.parse("define b") and parser.parse("b") give the exact same result. I would expect parser.parse("define b") to give a tree beginning with the define rule, but instead I have an identifier.
Sometime Lark parser doesn't clearly identifies a rule, for instance,
define bandbgivesTree(identifier, [Token(NAME, 'b')]). To be able to distinguish the two, you need to force Lark to add a name to the rule, this can be done by adding-> name_of_ruleat the end of a line in the parser definition. So for instance, the definition of the?definerule should become: