ANTLR gramar to detect ambiguous tokens

283 Views Asked by At

I'm creating a simple grammar in ANTLR to match somekind of commands. I'm stuck with tokens which use special characters.

Those commands would match sentences like...

  1. connect "HAL" computer 4
  2. connect "HAL256" computer 8
  3. connect "HAL2⁸" computer 16
  4. connect "HAL 9000" computer 32
  5. connect "HAL \x0A25 | 32" computer 64

... to produce something like:

interpretation

It's clear that my problem is in the ID token, but I don't know how to solve it. Here is my current grammar:

grammar foo;
ID      :   '"' ('\u0000'..'\uFFFF')+ '"' ;
NUMBER  :    ('0'..'9')* ;
SENTENCE    :    'connect ' ID ' computer' NUMBER ;

How could I do it?

1

There are 1 best solutions below

3
On BEST ANSWER

There are a couple of issues with your grammar:

  1. NUMBER matches an empty string: lexer rules must always match at least 1 character
  2. SENTENCE should be a parser rule (see: Practical difference between parser rules and lexer rules in ANTLR?)
  3. ('\u0000'..'\uFFFF')+ also matches a '"', which you most probably son't want

Try something like this instead:

sentence   : K_CONNECT ID K_COMPUTER NUMBER;

K_CONNECT  : 'connect';
K_COMPUTER : 'computer';
ID         : '"' (~'"')+ '"';
NUMBER     : ('0'..'9')+;
SPACE      : (' ' | '\t' | '\r' | '\n')+ {skip();};