I'm having issues debugging an ANTLR grammar I'm working on for Gameboy Assembly. It seems to work normally, but for some reason it cannot handle 0x notation for Hexadecimal in certain edge cases.
If my input string is "JR 0x10" antlr fails with a 'no viable alternative at input' error. As I understand it, it means I either have no rule to parse the token stream or '0x' is not properly being understood. If I use "JR $10" (one of the alternate notations I support) it works perfectly. But '0x' and '$' are expressed in the same rule.
Here is my g4 file:
grammar GBASM;
eval : exp EOF;
exp : exp op | exp sys | op | sys;
sys : include | section | label | data;
op : monad | biad arg | triad arg SEPARATOR arg;
monad : NOP|RLCA|RRCA|STOP|RLA|RRA|DAA|CPL|SCF|CCF|HALT|RETI|DI|EI|RST|RET;
biad : INC|DEC|SUB|AND|XOR|OR|CP|POP|PUSH|RLC|RRC|RL|RR|SLA|SRA|SWAP|SRL|JP|JR;
triad : RET|JR|JP|CALL|LD|LDD|LDI|LDH|ADD|ADC|SBC|BIT|RES|SET;
arg : (register|value|negvalue|flag|offset|jump|memory);
memory : MEMSTART (register|value|jump) MEMEND;
offset : register Plus value | register negvalue;
register : A|B|C|D|E|F|H|L|AF|BC|DE|HL|SP|HLPLUS|HLMINUS;
flag : NZ | NC | Z | C;
data : DB db;
db : string_data | value | string_data SEPARATOR db | value SEPARATOR db;
include : INCLUDE string_data;
section : SECTION string_data SEPARATOR HOME '[' value ']';
string_data: STRINGLITERAL;
jump : LIMSTRING;
label : LIMSTRING ':';
Z : 'Z';
A : 'A';
B : 'B';
C : 'C';
D : 'D';
E : 'E';
F : 'F';
H : 'H';
L : 'L';
AF : 'AF';
BC : 'BC';
DE : 'DE';
HL : 'HL';
SP : 'SP';
NZ : 'NZ';
NC : 'NC';
value : HexInteger | Integer;
negvalue : (Neg Integer) | (Neg HexInteger);
Neg : '-';
Plus : '+';
HexInteger : (HexPrefix HexDigit+) | (HexDigit+ HexPostfix);
Integer : Digit+;
fragment Digit : ('0'..'9');
HLPLUS : 'HL+' | 'HLI';
HLMINUS : 'HL-' | 'HLD';
MEMSTART : '(';
MEMEND : ')';
LD : 'LD' | 'ld';
JR : 'JR' | 'jr';
JP : 'JP' | 'jp';
OR : 'OR' | 'or';
CP : 'CP' | 'cp';
RL : 'RL' | 'rl';
RR : 'RR' | 'rr';
DI : 'DI' | 'di';
EI : 'EI' | 'ei';
DB : 'DB';
LDD : 'LDD' | 'ldd';
LDI : 'LDI' | 'ldi';
ADD: 'ADD' | 'add';
ADC : 'ADC' | 'adc';
SBC : 'SBC' | 'sbc';
BIT : 'BIT' | 'bit';
RES : 'RES' | 'res';
SET : 'SET' | 'set';
RET: 'RET' | 'ret';
INC : 'INC' | 'inc';
DEC : 'DEC' | 'dec';
SUB : 'SUB' | 'sub';
AND : 'AND' | 'and';
XOR : 'XOR' | 'xor';
RLC : 'RLC' | 'rlc';
RRC : 'RRC' | 'rrc';
POP: 'POP' | 'pop';
SLA : 'SLA' | 'sla';
SRA : 'SRA' | 'sra';
SRL : 'SRL' | 'srl';
NOP : 'NOP' | 'nop';
RLA : 'RLA' | 'rla';
RRA : 'RRA' | 'rra';
DAA : 'DAA' | 'daa';
CPL : 'CPL' | 'cpl';
SCF : 'SCF' | 'scf';
CCF : 'CCF' | 'ccf';
LDH : 'LDH' | 'ldh';
RST : 'RST' | 'rst';
CALL : 'CALL' | 'call';
PUSH : 'PUSH' | 'push';
SWAP : 'SWAP' | 'swap';
RLCA : 'RLCA' | 'rlca';
RRCA : 'RRCA' | 'rrca';
STOP : 'STOP 0' | 'STOP' | 'stop 0' | 'stop';
HALT: 'HALT' | 'halt';
RETI: 'RETI' | 'reti';
HOME: 'HOME';
SECTION: 'SECTION';
INCLUDE: 'INCLUDE';
fragment HexPrefix : ('0x' | '$');
fragment HexPostfix : ('h' | 'H');
fragment HexDigit : ('0'..'9'|'a'..'f'|'A'..'F');
STRINGLITERAL : '"' ~["\r\n]* '"';
LIMSTRING : ('_'|'a'..'z'|'A'..'Z'|'0'..'9')+;
SEPARATOR : ',';
WS : (' '|'\t'|'\n'|'\r') ->channel(HIDDEN);
COMMENT : ';' ~('\n'|'\r')* '\r'? '\n' ->channel(HIDDEN);
In the failing case it looks like I terminate on 'op', in the passing case it correctly drills down to 'value' and my parser snags the information. Is there some quirk of ANTLR4 grammar that I'm missing?
I'm generating a C# parser in case that's relevant.
It turns out it was the order of my hexadecimal rules.
The reason I didn't see anything change was because Visual Studio was looking at an old copy of my grammar (because Microsofts file-path system is somewhat... alternative).
My modified grammar works perfectly.
Thanks!