I have defined two sets of identifiers IDENTIFIER_ONE and IDENTIFIER_TWO which are both exculsive subsets of IDENTIFIER. I would like to write a parser such that:
"i1(arg) EOS" can't be parsed (1)
"i2(arg) EOS" can be parsed (2)
"i1(arg) = value EOS" can be parsed (3)
"i2(arg) = value EOS" can be parsed (4)
where i1(resp., i2) belongs to IDENTIFIER_ONE (resp., IDENTIFIER_TWO); arg and value belong to IDENTIFIER. The following parser.mly has already realized all the points I am after, except (4):
identifier:
| IDENTIFIER_ONE { $1 }
| IDENTIFIER_TWO { $1 }
block_statement_EOS:
| identifier LPAREN identifier RPAREN EQUAL identifier EOS { BSE_Let ($1, $3, $6) }
| IDENTIFIER_TWO LPAREN identifier RPAREN EOS { BSE_I_I ($1, $3) }
Given i1(arg) = value EOS as input, as goal (3) it is correctly read as BSE_Let (i1, arg, value). However, given i2(arg) = value EOS as input, it stops the parsing after reading EQUAL. I guess it is because once the parse meets i2(arg), it goes to the 2nd rule of block_statement_EOS, and later EQUAL can't be parsed.
Ideally, I would hope the parser could try the 1st rule of block_statement_EOS if the 2nd rule fails. Could anyone help me to make this possible?
PS: If I write the parser.mly as follows, all the goals can be achieved. Does anyone know why? Additionally I really don't like this workaround, because I do need to write identifier instead of two subsets in many other rules, I would hope a more elegant solution...
block_statement_EOS:
| IDENTIFIER_ONE LPAREN identifier RPAREN EQUAL identifier EOS { BSE_Let ($1, $3, $6) }
| IDENTIFIER_TWO LPAREN identifier RPAREN EQUAL identifier EOS { BSE_Let ($1, $3, $6) }
| IDENTIFIER_TWO LPAREN identifier RPAREN EOS { BSE_I_I ($1, $3) }
When your parser encounters an
LPARENafter anIDENTIFIER_TWO, it has to decide whether to shift or to reduce:LPARENon the stack;IDENTIFIER_TWO, which is on top of the stack, byidentifier.If your parser chooses to shift, it will never reduce this particular
IDENTIFIER_TWOintoidentifier(because this particularIDENTIFIER_TWOwill never be on top of the stack again), meaning that it will always reduce the second rule ofblock_statement_EOS.If your parser chooses to reduce, it will never reduce the second rule of
block_statement_EOS, as this rule starts withIDENTIFIER_TWOand notidentifier.This is why your second version works, because there is no need to choose between shifting and reducing after
IDENTIFIER_TWO. The choice is made later, if you wish.