IntelliJ: Grammar-Kit / BNF: how to recover from errors?

1k Views Asked by At

I am writing a Custom Language plugin for IntelliJ.

Here is a simplified example of the language. Note that the structure is recursive:

enter image description here

I have successfully implemented the FLEX and BNF files, but I'm not sure how to add error recovery. I've read about RecoverWhile and pin in Grammar-Kit's HOWTO, but I'm not sure how to apply them to my scenario.

I call the brown items above ("aaa", "ccc", etc...) "items".

I call the yellow ones ("bbb", "ddd", ...) "properties".

Each item has an item name (e.g. "aaa"), a single property (e.g. "bbb"), and can contain other items (e.g. "aaa" contains "ccc", "eeee", and "gg").

At the moment, the plugin doesn't behave well when an item is malformed. For example:

enter image description here

In this example, I would like the parser to "understand" that "ccc" is the name of an item with a missing property (e.g. by detecting a newline before the closing bracket).

I don't want the broken "ccc" item to influence the parsing of "eeee" (but I do want the PSI tree to have the elements of "ccc" that are present in the text, in this case - its name).

Here are the FLEX and BNF that I use:

FLEX:

CRLF= \n|\r|\r\n
WS=[\ \t\f]
WORD=[a-zA-Z0-9_#\-]+

%state EOF

%%
<YYINITIAL>    {WORD} { yybegin(YYINITIAL); return MyLangTypes.TYPE_FLEX_WORD; }
<YYINITIAL>    \[     { yybegin(YYINITIAL); return MyLangTypes.TYPE_FLEX_OPEN_SQUARE_BRACKET; }
<YYINITIAL>    \]     { yybegin(YYINITIAL); return MyLangTypes.TYPE_FLEX_CLOSE_SQUARE_BRACKET; }
<YYINITIAL>    \{     { yybegin(YYINITIAL); return MyLangTypes.TYPE_FLEX_OPEN_CURLY_BRACKET; }
<YYINITIAL>    \}     { yybegin(YYINITIAL); return MyLangTypes.TYPE_FLEX_CLOSE_CURLY_BRACKET; }
({CRLF}|{WS})+        { return TokenType.WHITE_SPACE; }
{WS}+                 { return TokenType.WHITE_SPACE; }
.                     { return TokenType.BAD_CHARACTER; }

BNF:

myLangFile ::= (item|COMMENT|CRLF)
item ::=
    itemName
    (TYPE_FLEX_OPEN_SQUARE_BRACKET itemProperty? TYPE_FLEX_CLOSE_SQUARE_BRACKET?)?
    itemBody?
itemName ::= TYPE_FLEX_WORD
itemProperty ::= TYPE_FLEX_WORD
itemBody ::= TYPE_FLEX_OPEN_CURLY_BRACKET item* TYPE_FLEX_CLOSE_CURLY_BRACKET
1

There are 1 best solutions below

0
On

I was eventually able to make it work like this:

myLangFile ::= (item|COMMENT|CRLF)
item ::=
    itemName
    itemProperties
    itemBody?
itemName ::= TYPE_FLEX_WORD
itemProperties ::= TYPE_FLEX_OPEN_SQUARE_BRACKET [!TYPE_FLEX_CLOSE_SQUARE_BRACKET itemProperty ((TYPE_FLEX_SEMICOLON itemProperty)|itemProperty)*] TYPE_FLEX_CLOSE_SQUARE_BRACKET {
    pin(".*") = 1
}
itemProperty ::= TYPE_FLEX_WORD TYPE_FLEX_EQUALS? itemPropertyValue? (TYPE_FLEX_EQUALS prv_swallowNextPropertyToPreventSyntaxErrors)?
private prv_swallowNextPropertyToPreventSyntaxErrors ::= TYPE_FLEX_WORD
itemPropertyValue ::= TYPE_FLEX_WORD
itemBody ::= TYPE_FLEX_OPEN_CURLY_BRACKET item* TYPE_FLEX_CLOSE_CURLY_BRACKET

It's not perfect; for example, it allows to separate item properties with space (and not just with a semi-colon) but it does seem to solve the more important problem.

This may also be of interest: https://github.com/JetBrains/Grammar-Kit/blob/master/resources/messages/attributeDescriptions/recoverWhile.html