how to convert antlr4 grammar file to tree-sitter grammar file?

1.1k Views Asked by At

Does anyone know of any tool(s) that can convert ANTLR v4 grammar files (.g4 extension) to tree-sitter grammar files (.js extension)? It would also be fine if I had to chain a couple conversion tools together. For example, going from foo.g4 (antlr4) to foo.ebnf (intermediary format) to foo.js (tree-sitter). Thank you!

I tried using this tool to go from g4 to ebnf, and then this tool to go from ebnf to tree-sitter js, but to no avail. The first tool seemed to create some junk at the bottom of the file which gave the second tool trouble. Additionally, the second tool seems to expect each definition to be completely on one line (and the first tool breaks each definition up into multiple lines for readability).

1

There are 1 best solutions below

0
On

a transpiler is required for this task, aka source-to-source compiler. a very early version of such a transpiler is my lezer-parser-import (lezer-parser and tree-sitter are very similar)

transpiling the basic syntax is trivial ...

challenges:

  • solve parsing-conflicts by adding precedence to tokens. lezer-parser also has ambiguity markers (tree-sitter should have something similar, because its also a GLR parser). solution concept: generate short valid source texts (fuzzing, unparsing, ref 1, ref 2, ref 3), compare parse trees (treediff), add markers to match the ANTLR parse-tree
  • translate negations. in ANTLR, you can negate tokens with an exclamation-point prefix: !some_token. this does not work in tree-sitter or lezer-parser, so probably you need to generate code for an external scanner (C code for tree-sitter, JavaScript code for lezer-parser)
  • translate regex tokens. in ANTLR, you can lex a block-comment with BlockComment: '/*' .*? '*/' -> skip;. in tree-sitter and lezer-parser, you need an external scanner for the .*? '*/' part