Recursive pyparsing of function invocation expression with args and kwargs

46 Views Asked by At

I am trying to create a parser using pyparsing that parses an expression of below formats:

FUNC_NAME(ARG1, ARG2, ..., ARGN, K1=V1, K2=V2, ..., KN=VN)

ARGs and KWARG values can be literals (quoted strings, numbers, a date expression without quotes) or itself be functions with infinite nesting.

The issue I can't seem to get past is the recursive structure with Forward().

Here is what I have:

import pyparsing as pp 

LP, RP = map(pp.Suppress, "()")
ident = ppc.identifier("ident")
number = pp.pyparsing_common.number
string = (pp.QuotedString("\"") | pp.QuotedString("'"))
date = pp.Combine(pp.Word(pp.nums, exact=4) + "-" + pp.Word(pp.nums, exact=2) + "-" + pp.Word(pp.nums, exact=2))("date")
literal = date | number | string

body = pp.Forward()
body <<= body | (ident + pp.Suppress("=") + body) | (ident + LP + body + RP) | literal | pp.empty
expr = ident + (LP + pp.Optional(pp.Group(pp.delimitedList(body))) + RP)

running this expression gives me a recursion depth error. Not sure what I'm doing wrong.

2

There are 2 best solutions below

1
PaulMcG On

This is an interesting problem, and props to you for getting started with a recursive grammar.

The immediate issue is in this definition of the contents of body:

body <<= body | (ident + pp.Suppress("=") + body) | (ident + LP + body + RP) | literal | pp.empty

This creates a left-recursive (or "LR") expression, because to parse a body you first have to parse a body, but first you have to parse a body, but first... etc. The other alternative terms may also use body, but as part of a larger expression (after being enclosed in parentheses, or as part of an assignment). So the solution might be to just remove the leading body term:

body <<= (ident + pp.Suppress("=") + body) | (ident + LP + body + RP) | literal | pp.empty

But you might have wanted to support enclosure of a body in parentheses, so changing to this would also resolve your LR issue:

body <<= LP + body + RP | (ident + pp.Suppress("=") + body) | (ident + LP + body + RP) | literal | pp.empty

Either of these changes will resolve your LR issue. By default, pyparsing does not support LR parsers. You can enable this, using a feature that was added in pyparsing 3.0.0:

pp.ParserElement.enable_left_recursion()

But with either of the suggested changes, it would not be necessary.

I also added these lines, to generate a railroad diagram of your parser. I've found these diagrams to be very helpful in visualizing the parser's structure;

pp.autoname_elements()
expr.create_diagram("lr_function_parser.html")

Which generated this diagram:

railroad diagram for the OPs parser

Good luck in your further pyparsing work!

0
PaulMcG On

When I've had to write parsers like this that required recognition of function calls, I usually defined a specific expression for function calls, and then included them as an option in my base operand or expression Like this:

# using previous definitions for ident, literal, etc.
EQ = pp.Suppress("=")
expr = pp.Forward()
fn_arg = ident + EQ + expr | expr
fn_call = pp.Group(ident + pp.Group(LP + pp.Optional(pp.delimited_list(fn_arg)) + RP))
expr <<= fn_call | literal | ident | pp.Group(LP + expr + RP)

Where expr takes the place of your body. From this diagram, see how this work compared to the above:

railroad diagram of new parser