So I'm working on a combined grammar in ANTLR4 using ANTLRWorks 2.1. I have the lexer rules Identifier and Block that are not being recognized as defined lexer rules, but only in the last parser rule defined. Adding a literal after these rules removes (or hides) these errors.
My grammar with the error at the end (italicized tokens are throwing the error):
grammar GCombined;
options { language = Cpp; }
@lexer::namespace{AntlrTest01}
@parser::namespace{AntlrTest01}
/* First Lexer Stage */
Bit: '0' | '1';
Digit : '0'..'9';
ODigit: '0'..'7';
XDigit: '0'..'f';
Letter: ('a'..'z') | ('A'..'Z');
Symbol: '|'
| '-'
| '!'
| '#'
| '$'
| '%'
| '&'
| '('
| ')'
| '*'
| '+'
| ','
| '-'
| '.'
| '/'
| ':'
| ';'
| '<'
| '='
| '>'
| '?'
| '@'
| '['
| ']'
| '^'
| '_'
| '`'
| '{'
| '|'
| '}'
| '~';
WSpace: ( ' '
| '\t'
| '\r'
| '\n'
| '\c'
| '\0'
| '\u000C'
)+ -> skip;
DNumber: Digit+;
ONumber: '0o' Digit+;
XNumber: '0x' Digit;
Integer: DNumber
| ONumber
| XNumber;
Float: DNumber '.' DNumber;
Character: Letter
| Digit
| Symbol
| WSpace;
String: Character+;
Literal: '"' String '"';
Boolean: 'true' | 'false';
/* Second Lexer Stage */
Number: Integer | Float;
Identifier: Letter (Letter | Digit | '_')+;
Keyword: Letter+;
Operator: '+'
| '-'
| '*'
| '/'
| '%'
| '=='
| '!='
| '>'
| '<'
| '>='
| '<='
| '&&'
| '||'
| '^'
| '&'
| '|'
| '<<'
| '>>'
| '~' ;
Expression: (Operator | Identifier)
'(' (Identifier | Number)+ ')';
Parameter: Identifier
| Expression
| Number;
Statement: Keyword '(' Parameter+ ')';
Block: '{' Statement+ '}';
/* Third Lexer Stage */
Add: '+';
Sub: '-';
Mlt: '*';
Div: '/';
Mod: '%';
Mathop: Add | Sub | Mlt | Div | Mod;
Deq: '==';
Neq: '!=';
Gtr: '>';
Lss: '<';
Geq: '>=';
Leq: '<=';
Condop: Deq | Neq | Gtr | Lss | Geq | Leq;
And: '&&';
Or: '||';
Xor: '^';
Bnd: '&';
Bor: '|';
Logop: And | Or | Xor | Bnd | Bor;
Neg: '!';
Boc: '~';
Negop: Neg | Boc;
Asl: '<<';
Asr: '>>';
Shftop: Asl | Asr;
Eql: '=';
Inc: '++';
Dec: '--';
Incop: Inc | Dec;
Peq: '+=';
Meq: '-=';
Teq: '*=';
Seq: '/=';
Req: '%=';
Casop: Peq | Meq | Teq | Seq | Req;
Lparen: '(';
Rparen: ')';
Lbrack: '[';
Rbrack: ']';
Lbrace: '{';
Rbrace: '}';
Point : '.';
Colon : ':';
Numvar: Number
| Identifier
| Mathop '(' Parameter+ ')';
Boolvar: Boolean
| Identifier
| Condop '(' Parameter+ ')'
| Logop '(' Parameter+ ')';
Metaxpr: (Identifier | Operator ) '(' Parameter+ ')';
/* First Parser Stage */
//expressions
add: '+' '(' Numvar+ ')';
sub: '-' '(' Numvar+ ')';
mlt: '*' '(' Numvar+ ')';
div: '/' '(' Numvar+ ')';
mod: '%' '(' Integer+ ')';
mathexpr: add
| sub
| mlt
| div
| mod;
eql: '==' '(' Parameter+ ')';
neq: '!=' '(' Parameter+ ')';
gtr: '>' '(' Parameter+ ')';
les: '<' '(' Parameter+ ')';
geq: '>=' '(' Parameter+ ')';
leq: '<=' '(' Parameter+ ')';
condexpr: eql
| neq
| gtr
| les
| geq
| leq;
and: '&&' '(' Parameter+ ')';
or : '||' '(' Parameter+ ')';
xor: '^' '(' Parameter+ ')';
bnd: '&' '(' Parameter+ ')';
bor: '|' '(' Parameter+ ')';
logexpr: and
| or
| xor
| bnd
| bor;
asl: '<<' '(' Parameter Numvar ')';
asr: '>>' '(' Parameter Numvar ')';
shiftexpr: asl | asr;
neg: '!' '(' Parameter ')';
boc: '~' '(' Parameter ')';
negexpr: neg
| boc;
arrexpr: Identifier '[' Numvar ']';
//instruction forms
vardec: 'def' '(' Identifier+ ')' ': ' Identifier ;
lindec: Identifier '(' Identifier ')';
assign: '=' '(' (Identifier | lindec) Parameter ')';
incstmt: (Inc | Dec) '(' Identifier ')'
| Casop '(' Identifier Identifier ')';
cond: 'if' '(' Boolvar ')' Block
('else if' '(' Boolvar ')' Block)?
('else' Block)?;
loop: (
('while' '(' (condexpr | negexpr) ')')
| ('for' '(' assign ',' (condexpr | negexpr) ',' incstmt')')
) Block;
fundef: 'func' '(' Identifier Parameter+ ')' ': ' Identifier Block;
prodef: 'proc' '(' Identifier Parameter* ')' Block;
call: Identifier '(' Parameter+ ')';
excHandler: 'try' Block
'catch' '(' Identifier ')' Block
('finally' Block)?;
classdef: 'class' '(' Identifier ')' (': ' _Identifier_)? _Block_;
ANTLR requires unambiguous grammar rules. In the provided grammar, the
Symbolrule conflicts with theOperatorrule and others. TheIdentifierandLetterrules conflict. Rules conflict when they can match the same input (content & length).Also, for example, the
Symbolrule includes'{'as an alt. Subsequent rules that use the literal'{'(which is an implicit token type) in any of their alts will not match because the implicit token type is not the same as theSymboltoken type. Best practice is to avoid redundant use of literals - define the literal in a rule, and then just reference that rule.Best advice would be to buy a copy of TDAR to learn Antlr4.