I'm trying to extent the SQL language of SQLite at one point (file parse.y). I have a parsing conflict, however the lemon parser does not show anything besides a random "1 parsing conflicts." error message.
The problem is located where create_table can be reduced to both "CREATE" or "CREATE OR REPLACE" which is followed by temp which can also be reduced to an empty token.
cmd ::= create_table create_table_args table_properties_args.
create_table ::= createorreplace(C) temp(T) TABLE ifnotexists(E) nm(X) dbnm(Y). {
// ...
}
%type createorreplace {int}
createorreplace(A) ::= CREATE. {disableLookaside(pParse); A = 0;}
createorreplace(A) ::= CREATE OR REPLACE. {disableLookaside(pParse); A = 1;}
%type temp {int}
temp(A) ::= TEMP. {A = pParse->db->init.busy==0;}
temp(A) ::= . {A = 0;}
How can I make "OR REPLACE" reduced optionally, while preserving that it may be followed by TEMP?
Since I can only guess how and where you might have changed SQLite's SQL grammar, this answer is necessarily somewhat tentative. But it might be useful anyway.
The original SQL grammar contains the following productions (I left out the actions since they are never relevant in diagnosing conflicts):
You seem to have modified
create_table
to instead read:That change indeed creates a conflict, but it has nothing to do with
temp
being nullable. In fact, it has very little to do with the non-terminaltemp
at all. You could replacetemp
withTEMP
(thereby making it obligatory rather than optional) and you would still have a shift-reduce conflict.The conflict occurs for inputs which start
CREATE TEMP
. That input could be the start ofCREATE TEMP TABLE ...
CREATE TEMP VIEW ...
Those are obviously different syntaxes, and there is no ambiguityBut when the terminal
CREATE
has just been read and the terminalTEMP
is the lookahead token, both of those possibilities are still available. That's not necessarily a problem; a bottom-up parser does not need to resolve which possible production will be used until it gets to the end of the production. So the original grammar works fine, without conflicts.But note that the original grammar does not have a
cmd
production which starts with the terminalCREATE
. What it has are severalcmd
productions which start with the non-terminalcreatekw
. But there is no possibility of confusion there, either. The terminalCREATE
is reduced tocreatekw
in bothcmd
productions (and othercmd
productions I didn't list, which also start withcreatekw
).However, in your modified grammar, the two productions do not both start with
createkw
. One of them was changed to start withcreateorreplace
.Inputs which do not include the optional keyword
TEMP
still parse without any problem. IfTEMP
is not present, the lookahead token will beTABLE
in thecreate_table
command, and the lookahead token will beVIEW
in the create view command. Since the lookahead tokens differ, the parser has no trouble deciding whether to reduce tocreatekw
or to reduce tocreateorreplace
. Similarly, if the input were actuallyCREATE OR REPLACE ...
, the lookahead token would beOR
, which unambiguously forces a reduction tocreateorreplace
.But the problematic input, as shown above, starts
CREATE TEMP
. Now, the parser must decide, without seeing anything which follows the terminalTEMP
, whether to reduceCREATE
tocreatekw
or to reduce it tocreateorreplace
. Since that determination cannot be made, a conflict is reported. (And you'll find a lot more information about that conflict by looking through the Lemon report file,parse.out
.)The solution (if my guess about your grammar modifications was correct) is to avoid forcing the parser to make an unnecessary decision. That requires a little bit of grammar duplication:
Now, the terminal
CREATE
not followed byOR REPLACE
is always reduced tocreatekw
, while the sequenceCREATE OR REPLACE
is always reduced tocreateorreplace
. This works because there is no possible parse for acmd
startingCREATE OR
, other thanCREATE OR REPLACE
.