Here is a simple lexer grammar:
lexer grammar TextLexer;
@members
{
protected const int EOF = Eof;
protected const int HIDDEN = Hidden;
}
COMMENT: 'comment' .*? 'end' -> channel(HIDDEN);
WORD: [a-z]+ ;
WS
: ' ' -> channel(HIDDEN)
;
For the most part, it behaves as expected, grabbing the words out of the stream, and ignoring anything bounded by comment . . . end. But not always. For example, if the input is the following:
quick brown fox commentandending
it will see that the word "commentandending" is longer than the comment "commentandend". So it comes out with a token "commentandending" rather than a token "ing".
Is there a way to change that behavior?
This grammar will solve the problem in ANTLR4: