Avoiding left recursion in parsing LiveScript object definitions

226 Views Asked by At

I'm working on a parser for LiveScript language, and am having trouble with parsing both object property definition forms — key: value and (+|-)key — together. For example:

prop: "val"
+boolProp
-boolProp
prop2: val2

I have the key: value form working with this:

Expression ::= TestExpression
    | ParenExpression
    | OpExpression
    | ObjDefExpression
    | PropDefExpression
    | LiteralExpression
    | ReferenceExpression

PropDefExpression ::= Expression COLON Expression

ObjDefExpression ::= PropDefExpression (NEWLINE PropDefExpression)*

// ... other expressions

But however I try to add ("+"|"-") IDENTIFIER to PropDefExpression or ObjDefExpression, I get errors about using left recursion. What's the (right) way to do this?

2

There are 2 best solutions below

1
On BEST ANSWER

The grammar fragment you posted is already left-recursive, i.e. without even adding (+|-)boolprop, the non-terminal 'Expression' derives a form in which 'Expression' reappears as the leftmost symbol:

Expression -> PropDefExpression -> Expression COLON Expression

And it's not just left-recursive, it's ambiguous. E.g.

Expression COLON Expression COLON Expression

can be derived in two different ways (roughly, left-associative vs right-associative).

You can eliminate both these problems by using something more restricted on the left of the colon, e.g.:

PropDefExpression ::= Identifier COLON Expression

Also, another ambiguity: Expression derives PropDefExpression in two different ways, directly and via ObjDefExpression. My guess is, you can drop the direct derivation.

Once you've taken care of those things, it seems to me you should be able to add (+|-)boolprop without errors (unless it conflicts with one of the other kinds of expression that you didn't show).

Mind you, looking at the examples at http://livescript.net, I'm doubtful how much of that you'll be able to capture in a conventional grammar. But if you're just going for a subset, you might be okay.

0
On

I don't know how much help this will be, because I know nothing about GrammarKit and not much more about the language you're trying to parse.

However, it seems to me that

PropDefExpression ::= Expression COLON Expression

is not quite accurate, and it is creating an ambiguity when you add the boolean property production because an Expression might start with a unary - operator. In the actual grammar, though, a property cannot start with an arbitrary Expression. There are two types of key-property definitions:

name : expression
parenthesized_expression : expression

(Which is to say, expressions need to start with a ().

That means that a boolean property definition, starting with + or - is recognizable from the first token, which is precisely the condition needed for successful recursive descent parsing. There are several other property definition syntaxes, including names and parenthesized_expressions not followed by a :

That's easy to parse with an LR(1) parser, like the one Jison produces, but to parse it with a recursive-descent parser you need to left-factor. (It's possible that GrammarKit can do this for you, by the way.) Basically, you'd need something like (this is not complete):

PropertyDefinition ::= PropertyPrefix PropertySuffix? | BooleanProperty
PropertyPrefix ::= NAME | ParenthesizedExpression
PropertySuffix ::= COLON Expression | DOT NAME