I'm trying to create a Syntax Analyzer in Java (using CUP) that could recognise this piece of code:
if ¿b? then
~ a = 2;
~ if ¿b && c? then
~ ~ a = 3;
else
~ a = 4;
My productions used by the "if" statement are these that follow:
Instr ::= ...
| IF CONOP Exp:e CONCL THEN CondInstrList:l
...
;
...
CondInstrList ::= CondInstrList CondInstr
| /*empty*/
;
...
CondInstr ::= CONTROLD Instr
| CONTROLD CondInstr
;
where Instr stands for instruction/statement, CondInstrList stands for Conditional Instruction List and CONTROLD stands for Control Dash (~). (CONOP and CONCL mean Condition Open/Close)
The problem is that with that grammar, the generated AST is as follows:
if
|-condition b
|-condInstrListT
|---asig a = 2
|---if
|---condition b and c
|---condInstrListT
| |---asig a = 2
|---condInstrListF
|---asig a = 4
and so, the "else" part is associated with the inner "if".
I just don't know how to write a grammar that respects the way I want my language to be.
Any help is appreciated.
I can give more detail if needed.
I don't think you can do what you have in mind through the grammar alone. But it is possible with a slightly different grammar and some help from your lexical analyzer.
Here's what to do: rather than treat the ~ marks as individual grammar symbols, have the lexical analyzer turn sequences of ~ at the start of a line into INDENT and OUTDENT tokens that will work in your grammar the same way that { and } work in Java. You keep track of a "current indent level", which starts at zero. At the start of each line, count the ~ characters. For each ~ in excess of the current indent level, generate an INDENT token and increase the current indent level; for each ~ less than the current indent level, generate an OUTDENT token and decrease the current indent level.
So your example text of
would be tokenized as:
The INDENT and OUTDENT tokens would act in your grammar like left- and right-braces do in Java, so your grammar could look something like:
The language Python does the same thing but with just white space instead of ~. You can download Python source here if you're interested. Look for the files
Grammar\Grammar
andParser\tokenizer.c
.