how to write kleene closure in a production?

1.1k Views Asked by At

I'm writing a parser using PLY. The language the parser is for is called s-lang, in the grammar of the language I have the following production:

IdentList → IdentList , identifier ArrayBrackets*

I have already written the production for ArrayBrackets. I tried writing the above production as

def p_IdentList(t):
          '''IdentList : IdentList COMMA ID ArrayBrackets*'''

I have regular expressions for the vars COMMA and ID. Th problem is that when I include the star, I get the following error:

ERROR: main.py:115: Illegal name 'ArrayBrackets*' in rule 'IdentList'
Traceback (most recent call last):
  File "main.py", line 175, in <module>

I tried escpacing the star but it didn't help --- how am I supposed to write the Kleene closure in a production?

[EDIT]

After examining this question closely, I saw that @GrijeshChauhan says, given a nonterminal e. The Kleene closure of e, that is, e* will be given by the following production

S → eS | ^

where eS is the concatenation of e with S and ^ is null/empty/epsilon. My question is, does e have to be a terminal? Can I not apply the same logic to produce a new production for a nonterminal, for example:

def p_ArrayBracketsSTAR(t):
'''ArrayBracketsSTAR : ArrayBracketsSTAR ArrayBrackets | '''
1

There are 1 best solutions below

3
On

Maybe this helps:

def p_list(p):
    '''expression   : columnname IN LPAREN listbody RPAREN
    '''
    p[0] = Node('List', leaf=(p[1], tuple(p[4])))

def p_listbody(p):
    '''listbody     : value
                    | value COMMA listbody
    '''
    if len(p) == 2:
        p[0] = [p[1]]
    elif len(p) == 4:
        p[0] = [p[1]] + p[3]
    else:
        raise AssertionError(len(p))

def p_value(p):
    '''value        : SINGLEQUOTEDSTRING
                    | DOUBLEQUOTEDSTRING
                    | NUMBER
    '''
    p[0] = p[1]

The tokens for LPAREN, RPAREN, ID, COMMA, ...STRING and NUMBER are not shown here and should be obvious. The production rule for columnname is irrelevant.

It basically says that a List is a columnname followed by IN ( listbody ). A listbody is either a single value or a value followed by more values separated by COMMA.

This does not allow an empty list like mycolumn in [] but you can add a alternative production rule for list like columnname IN LPAREN RPAREN to cover that case.

Hope this helps.