What is the trick to parsing a C assignment-expression with no backtracking?

144 Views Asked by At

Parsing a C assignment-expression top-down has to choose between a conditional-expression and a unary-expression. The (unfortunately complicated) BNF is:

<assignment-expression>     ::= <conditional-expression>
                              | <unary-expression> <assignment-operator> <assignment-expression>
<conditional-expression>    ::= <logical-or-expression> 
                              | <logical-or-expression> ? <expression> : <conditional-expression>
<logical-or-expression>     ::= <logical-and-expression> 
                              | <logical-or-expression> || <logical-and-expression>
<logical-and-expression>    ::= <inclusive-or-expression> 
                              | <logical-and-expression> && <inclusive-or-expression>
<inclusive-or-expression>   ::= <exclusive-or-expression> 
                              | <inclusive-or-expression> '|' <exclusive-or-expression>
<exclusive-or-expression>   ::= <and-expression> 
                              | <exclusive-or-expression> ^ <and-expression>
<and-expression>            ::= <equality-expression> 
                              | <and-expression> & <equality-expression>
<equality-expression>       ::= <relational-expression> 
                              | <equality-expression> == <relational-expression> 
                              | <equality-expression> != <relational-expression>
<relational-expression>     ::= <shift-expression>
                              | <relational-expression> < <shift-expression>
                              | <relational-expression> > <shift-expression>
                              | <relational-expression> <= <shift-expression>
                              | <relational-expression> >= <shift-expression>
<shift-expression>          ::= <additive-expression>
                              | <shift-expression> << <additive-expression>
                              | <shift-expression> >> <additive-expression>
<additive-expression>       ::= <multiplicative-expression>
                              | <additive-expression> + <multiplicative-expression>
                              | <additive-expression> - <multiplicative-expression>
<multiplicative-expression> ::= <cast-expression>
                              | <multiplicative-expression> * <cast-expression>
                              | <multiplicative-expression> / <cast-expression>
                              | <multiplicative-expression> % <cast-expression>
<cast-expression>           ::= <unary-expression>
                              | ( <type-name> ) <cast-expression>
<unary-expression>          ::= <postfix-expression>
                              | ++ <unary-expression>
                              | -- <unary-expression>
                              | <unary-operator> <cast-expression>
                              | sizeof <unary-expression>
                              | sizeof <type-name>
<postfix-expression>        ::= <primary-expression>
                              | <postfix-expression> [ <expression> ]
                              | <postfix-expression> ( {<assignment-expression>}* )
                              | <postfix-expression> . <identifier>
                              | <postfix-expression> -> <identifier>
                              | <postfix-expression> ++
                              | <postfix-expression> --
<primary-expression>        ::= <identifier>
                              | <constant>
                              | <string>
                              | ( <expression> )

Note that a unary-expression satisfies the cast-expression rule.

If the input is X=3 and the parser tries the conditional-expression branch first then the input satisfies the primary-expression rule by simply stopping at X. The parser would have to backtrack and consider the unary-expression assignment-operator assignment-expression branch to discover the =3 part.

If the input is X==Y and the parser tries the unary-expression branch first then the assignment-operator is never found and the parser would have to backtrack and consider the conditional-expression branch to discover the equality-expression rule.

I don't see how to left-factor the BNF anywhere to avoid backtracking. Are recursion and backtracking unavoidable? Or is there a trick?

EDIT: In response to the comment requesting a reduced grammar, the following will demonstrate the X=3 vs. X==Y problem, but only to demonstrate how the reduced grammar requires backtracking, for people who don't wish to grok the full grammar. Whatever fix is made to avoid backtracking in this reduced grammar might not be possible with the full grammar.

<assignment-expression>     ::= <conditional-expression>
                              | <unary-expression> <assignment-operator> <unary-expression>
<conditional-expression>    ::= <unary-expression> == <unary-expression>
<unary-expression>          ::= <identifier>
<assignment_operator>       ::= '='

If the input is X=3 and the parser looks for conditional-expression first, == isn't found so the parser has to backtrack, while the input X==Y works with no backtracking.

If the input is X=3 and the parse looks for unary-expression first, the input is acceptable, while the input X==Y will require backtracking to match conditional-expression.

0

There are 0 best solutions below