g4 file:
grammar TestFlow;
options
{
language=CSharp4;
output=AST;
}
/*
* Parser Rules
*/
compileUnit : LC | BC ;
/*
* Lexer Rules
*/
BC : '/*' .*? '*/' ;
LC : '//' .*? [\r\n] ;
Code:
var input = " /*aaa*/ /// \n ";
var stream = new AntlrInputStream(input);
ITokenSource lexer = new TestFlowLexer(stream);
ITokenStream tokens = new CommonTokenStream(lexer);
var parser = new TestFlowParser(tokens);
parser.BuildParseTree = true;
var tree = parser.compileUnit();
var n = tree.ChildCount;
var top = new List<string>();
for (int i = 0; i < n; i++) {
top.Add(tree.GetChild(i).GetText());
}
After running above code I get single string in top: /*aaa*/. The single-line comment isn't caught.
What's wrong?
All parser/lexer generation errors & warnings are significant. Both
optionsstatements are invalid in the current version of Antlr4.The runtime errors detail the root problem: unrecognizable input characters, specifically, the grammar does not handle whitespace. Add a lexer rule to fix:
While not necessarily a problem, it is good form to require the parser to process all input. The lexer will generate an
EOFtoken at the end of the source input. Fix the main rule to require theEOF:The correct way to allow for repetition is to use a
*or+operator: