Antlr4 left-recursive rule appears to produce right-associative parse

863 Views Asked by At

The following grammar illustrates the issue:

// test Antlr4 left recursion associativity
grammar LRA;
@parser::members {
    public static void main(String[] ignored) throws Exception{
        final LRALexer lexer = new LRALexer(new ANTLRInputStream(System.in));
        final LRAParser parser = new LRAParser(new CommonTokenStream(lexer));
        parser.setTrace(true);
        parser.file();
    }
}
ID: [A-Za-z_] ([A-Za-z_]|[0-9])*;
CMA: ',';
SMC: ';';
UNK: . -> skip;
file: punctuated EOF;
punctuated
    : punctuated cma punctuated
    | punctuated smc punctuated
    | expression
    ;
cma: CMA;
smc: SMC;
expression: id;
id: ID;

Given input "a,b,c" i get listener event trace output

( 'a' ) ( ',' ( 'b' ) ( ',' ( 'c' ) ) )

where ( represents enter punctuated, ) represents exit punctuated, and all other rules are omitted for brevity and clarity.

By inspection, this order of listener events represents a right-associative parse.

Common practice, and The Definitive Antlr 4 Reference, lead me to expect a left-associative parse, corresponding to the following listener event trace

( 'a' ) ( ',' ( 'b' ) ) ( ',' ( 'c' ) )

Is there something wrong with my grammar, my expectations, my interpretation of the listener events, or something else?

1

There are 1 best solutions below

0
On

I would consider the workaround described above to be an adequate answer. The generated parser needs to pass a precedence parameter to a recursive call, and since the precedence is associated with a token, the token has to be directly available in the recursive rule so Antlr can find its precedence.

The working grammar looks like this:

// test Antlr4 left recursion associativity
grammar LRA;
@parser::members {
    public static void main(String[] ignored) throws Exception{
        final LRALexer lexer = new LRALexer(new ANTLRInputStream(System.in));
        final LRAParser parser = new LRAParser(new CommonTokenStream(lexer));
        parser.setTrace(true);
        parser.file();
    }
}
ID: [A-Za-z_] ([A-Za-z_]|[0-9])*;
CMA: ',';
SMC: ';';
UNK: . -> skip;
file: punctuated EOF;
punctuated
    : punctuated CMA punctuated
    | punctuated SMC punctuated
    | expression
    ;
expression: id;
id: ID;