How to implement a negative LOOKAHEAD check for a token in JavaCC?

903 Views Asked by At

I currently implementing a JavaScript/ECMAScript 5.1 parser with JavaCC. I recently learned about LOOKAHEADs which are handy here as the grammar is not fully LL(1).

One of the things I see in the ECMAScript grammar is "negative lookahead check", like in the following ExpressionStatement production:

ExpressionStatement :
    [lookahead ∉ {{, function}] Expression ;

So I'll probably need something like LOOKAHEAD(!("{" | "function")) but it does not work in this syntax.

My question is, how could I implement this "negative LOOKAHEAD" it in JavaCC?

After reading the LOOKAHEAD MiniTutorial I think that an expression like getToken(1).kind != FUNCTION may be what I need, but I am not quite sure about it.

2

There are 2 best solutions below

2
On BEST ANSWER

For the example you provide, I would prefer to use syntactic look ahead, which is in a sense necessarily "positive".

The production for ExpressionStatement is not the place to tackle the problem as there is no choice.

void ExpressionStatement() : {} { Expression() ";" }

The problem will arise where there is a choice between an expression statement and a block or between an expression statement and a function declaration (or both).

E.g. in Statement you will find

void Statement() :{} {
    ...
|
    Block()
|
    ExpressionStatement() 
|   ...
}

gives a warning because both choices can start with a "{". You have two options. One is to ignore the warning. The first choice will be taken and all will be well, as long as Block comes first. The second choice is to suppress the warning with a lookahead specification. like this:

void Statement() :{} {
    ...
|
    LOOKAHEAD("{") Block()
|
    ExpressionStatement() 
|   ...
}

Syntactic look ahead is, in a sense positive -- "take this alternative if X".

If you really want a negative --i.e., "take this alternative if not X"-- look ahead it has to be semantic.

In the case of Statement you could write

void Statement() :{} {
    ...
|
    LOOKAHEAD({!(getToken(1)==LBRACE)}) ExpressionStatement() 
|   
    Block()
}

I made sure that these are the last two alternatives since otherwise you'd need to include more tokens in the set of tokens that block ExpressionStatement(), e.g. it should not be chosen if the next token is an "if" or a "while" or a "for", etc.

On the whole, you are better off using syntactic lookahead when you can. It is usually more straight forward and harder to mess up.

0
On

I came across this question looking for something else, and yes, I am aware that the question was posed nearly 6 years ago.

The most advanced version of JavaCC is JavaCC21. and JavaCC21 does allow negative syntactic lookahead.

In JavaCC21 you would write LOOKAHEAD(~<LBRACE>) to specify that you only enter the expansion that follows if the next token is not an LBRACE, for example. The ~ character negates the lookahead expansion and you can use it to negate more complex expansions than a single token, if you want to. For example:

LOOKAHEAD (~(<LBRACE>|<LPAREN>))

There are actually quite a few other features that JavaCC21 that are not present in the legacy JavaCC project. Here is a biggie: the longstanding bug in which nested syntactic lookahead does not work correctly has been fixed. See here.