Question
Is there any way to declare function arguments as non-strict (passed by-name)?
If this is not possible directly: are there any helper functions or decorators that help me achieve something similar?
Concrete example
Here is a littly toy-example to experiment with.
Suppose that I want to build a tiny parser-combinator library that can cope with the following classic grammar for arithmetic expressions with parentheses (numbers replaced by a single literal value 1
for simplicity):
num = "1"
factor = num
| "(" + expr + ")"
term = factor + "*" + term
| factor
expr = term + "+" + expr
| term
Suppose that I define a parser combinator as an object that has a method parse
that can take list of tokens, current position, and either throw a parse error, or return a result and a new position. I can nicely define a ParserCombinator
base class that provides +
(concatenation) and |
(alternative). Then I can define parser combinators that accept constant strings, and implement +
and |
:
# Two kinds of errors that can be thrown by a parser combinator
class UnexpectedEndOfInput(Exception): pass
class ParseError(Exception): pass
# Base class that provides methods for `+` and `|` syntax
class ParserCombinator:
def __add__(self, next):
return AddCombinator(self, next)
def __or__(self, other):
return OrCombinator(self, other)
# Literally taken string constants
class Lit(ParserCombinator):
def __init__(self, string):
self.string = string
def parse(self, tokens, pos):
if pos < len(tokens):
t = tokens[pos]
if t == self.string:
return t, (pos + 1)
else:
raise ParseError
else:
raise UnexpectedEndOfInput
def lit(str):
return Lit(str)
# Concatenation
class AddCombinator(ParserCombinator):
def __init__(self, first, second):
self.first = first
self.second = second
def parse(self, tokens, pos):
x, p1 = self.first.parse(tokens, pos)
y, p2 = self.second.parse(tokens, p1)
return (x, y), p2
# Alternative
class OrCombinator(ParserCombinator):
def __init__(self, first, second):
self.first = first
self.second = second
def parse(self, tokens, pos):
try:
return self.first.parse(tokens, pos)
except:
return self.second.parse(tokens, pos)
So far, everything is fine. However, because the non-terminal symbols of the grammar are defined in a mutually recursive fashion, and I cannot eagerly unfold the tree of all possible parser combinations, I have to work with factories of parser combinators, and wrap them into something like this:
# Wrapper that prevents immediate stack overflow
class LazyParserCombinator(ParserCombinator):
def __init__(self, parserFactory):
self.parserFactory = parserFactory
def parse(self, tokens, pos):
return self.parserFactory().parse(tokens, pos)
def p(parserFactory):
return LazyParserCombinator(parserFactory)
This indeed allows me to write down the grammar in a way that is very close to the EBNF:
num = p(lambda: lit("1"))
factor = p(lambda: num | (lit("(") + expr + lit(")")))
term = p(lambda: (factor + lit("*") + term) | factor)
expr = p(lambda: (term + lit("+") + expr) | term)
And it actually works:
tokens = [str(x) for x in "1+(1+1)*(1+1+1)+1*(1+1)"]
print(expr.parse(tokens, 0))
However, the p(lambda: ...)
in every line is a bit annoying. Is there some idiomatic way to get rid of it? It would be nice if one could somehow pass the whole RHS of a rule "by-name", without triggering the eager evaluation of the infinite mutual recursion.
What I've tried
I've checked what's available in the core language: it seems that only if
, and
and or
can "short-circuit", please correct me if I'm wrong.
I've tried looking at how other non-toy-example libraries do this.
For example, funcparserlib uses explicit forward declarations to avoid mutual recursion (look at the
forward_decl
andvalue.define
part in github README.md example code).The
parsec.py
uses some special@generate
decorators and seems to do something like monadic parsing using coroutines. That's all very nice, but my goal is to understand what options I have with regards to the basic evaluation strategies available in Python.
I've also found something like the lazy_object_proxy.Proxy
, but it didn't seem to help to instantiate such objects in more concise way.
So, is there a nicer way to pass arguments by-name and avoid the blowup of mutually recursively defined values?
It's a nice idea, but it's not something that Python's syntax allows for: Python expressions are always evaluated strictly (with the exception of
if
blocks andand
andor
short-circuiting expressions).In particular, the problem is that in an expression like:
The
p
function argument is always received with a new name binding to the same object. The object resulting from evaluatinglit("1")
is not named anything (until a name is created by the formal parameter top
), so there is no name there to bind to. Conversely, there must be an object there, or otherwisep
wouldn't be able to receive a value at all.What you could do is add a new object to use instead of a lambda to defer evaluation of a name. For example, something like:
In this case,
d.lit
actually doesn't returnlit
, it returns aDeferredLookup
object that will usegetattr(locals(), 'lit')
to resolve its members when they are actually used. Note that this captureslocals()
eagerly, which you might not want; you can adapt that to use a lambda, or better yet just create all your entities in some other namespace anyway.You still get the wart of the
d.
in the syntax, which may or may not be a deal-breaker, depending on your goals with this API.