Non-strict by-name arguments in Python?

420 Views Asked by At

Question

Is there any way to declare function arguments as non-strict (passed by-name)?

If this is not possible directly: are there any helper functions or decorators that help me achieve something similar?


Concrete example

Here is a littly toy-example to experiment with.

Suppose that I want to build a tiny parser-combinator library that can cope with the following classic grammar for arithmetic expressions with parentheses (numbers replaced by a single literal value 1 for simplicity):

num    = "1"

factor = num 
       | "(" + expr + ")"

term   = factor + "*" + term 
       | factor

expr   = term + "+" + expr
       | term

Suppose that I define a parser combinator as an object that has a method parse that can take list of tokens, current position, and either throw a parse error, or return a result and a new position. I can nicely define a ParserCombinator base class that provides + (concatenation) and | (alternative). Then I can define parser combinators that accept constant strings, and implement + and |:

# Two kinds of errors that can be thrown by a parser combinator
class UnexpectedEndOfInput(Exception): pass
class ParseError(Exception): pass

# Base class that provides methods for `+` and `|` syntax
class ParserCombinator:
  def __add__(self, next):
    return AddCombinator(self, next)
  def __or__(self, other):
    return OrCombinator(self, other)

# Literally taken string constants
class Lit(ParserCombinator):
  def __init__(self, string):
    self.string = string

  def parse(self, tokens, pos):
    if pos < len(tokens):
      t = tokens[pos]
      if t == self.string:
        return t, (pos + 1)
      else:
        raise ParseError
    else:
      raise UnexpectedEndOfInput

def lit(str): 
  return Lit(str)

# Concatenation
class AddCombinator(ParserCombinator):
  def __init__(self, first, second):
    self.first = first
    self.second = second
  def parse(self, tokens, pos):
    x, p1 = self.first.parse(tokens, pos)
    y, p2 = self.second.parse(tokens, p1)
    return (x, y), p2

# Alternative
class OrCombinator(ParserCombinator):
  def __init__(self, first, second):
    self.first = first
    self.second = second
  def parse(self, tokens, pos):
    try:
      return self.first.parse(tokens, pos)
    except:
      return self.second.parse(tokens, pos)

So far, everything is fine. However, because the non-terminal symbols of the grammar are defined in a mutually recursive fashion, and I cannot eagerly unfold the tree of all possible parser combinations, I have to work with factories of parser combinators, and wrap them into something like this:

# Wrapper that prevents immediate stack overflow
class LazyParserCombinator(ParserCombinator):
  def __init__(self, parserFactory):
    self.parserFactory = parserFactory
  def parse(self, tokens, pos):
    return self.parserFactory().parse(tokens, pos)

def p(parserFactory):
  return LazyParserCombinator(parserFactory)

This indeed allows me to write down the grammar in a way that is very close to the EBNF:

num    = p(lambda: lit("1"))
factor = p(lambda: num | (lit("(") + expr + lit(")")))
term   = p(lambda: (factor + lit("*") + term) | factor)
expr   = p(lambda: (term + lit("+") + expr) | term)

And it actually works:

tokens = [str(x) for x in "1+(1+1)*(1+1+1)+1*(1+1)"]
print(expr.parse(tokens, 0))

However, the p(lambda: ...) in every line is a bit annoying. Is there some idiomatic way to get rid of it? It would be nice if one could somehow pass the whole RHS of a rule "by-name", without triggering the eager evaluation of the infinite mutual recursion.


What I've tried

I've checked what's available in the core language: it seems that only if, and and or can "short-circuit", please correct me if I'm wrong.

I've tried looking at how other non-toy-example libraries do this.

  • For example, funcparserlib uses explicit forward declarations to avoid mutual recursion (look at the forward_decl and value.define part in github README.md example code).

  • The parsec.py uses some special @generate decorators and seems to do something like monadic parsing using coroutines. That's all very nice, but my goal is to understand what options I have with regards to the basic evaluation strategies available in Python.

I've also found something like the lazy_object_proxy.Proxy, but it didn't seem to help to instantiate such objects in more concise way.

So, is there a nicer way to pass arguments by-name and avoid the blowup of mutually recursively defined values?

2

There are 2 best solutions below

3
On BEST ANSWER

It's a nice idea, but it's not something that Python's syntax allows for: Python expressions are always evaluated strictly (with the exception of if blocks and and and or short-circuiting expressions).

In particular, the problem is that in an expression like:

num = p(lit("1"))

The p function argument is always received with a new name binding to the same object. The object resulting from evaluating lit("1") is not named anything (until a name is created by the formal parameter to p), so there is no name there to bind to. Conversely, there must be an object there, or otherwise p wouldn't be able to receive a value at all.

What you could do is add a new object to use instead of a lambda to defer evaluation of a name. For example, something like:

class DeferredNamespace(object):
    def __init__(self, namespace):
        self.__namespace = namespace
    def __getattr__(self, name):
        return DeferredLookup(self.__namespace, name)

class DeferredLookup(object):
    def __init__(self, namespace, name):
        self.__namespace = namespace
        self.__name = name
    def __getattr__(self, name):
        return getattr(getattr(self.__namespace, self.__name), name)

d = DeferredNamespace(locals())

num = p(d.lit("1"))

In this case, d.lit actually doesn't return lit, it returns a DeferredLookup object that will use getattr(locals(), 'lit') to resolve its members when they are actually used. Note that this captures locals() eagerly, which you might not want; you can adapt that to use a lambda, or better yet just create all your entities in some other namespace anyway.

You still get the wart of the d. in the syntax, which may or may not be a deal-breaker, depending on your goals with this API.

1
On

Special solution for functions that must accept exactly one by-name argument

If you want to define a function f that has to take one single argument by-name, consider making f into a @decorator. Instead of an argument littered with lambdas, the decorator can then directly receive the function definition.


The lambdas in the question appear because we need a way to make the execution of the right hand sides lazy. However, if we change the definitions of non-terminal symbols to be defs rather than local variables, the RHS is also not executed immediately. Then what we have to do is to convert these defs into ParserCombinators somehow. For this, we can use decorators.


We can define a decorator that wraps a function into a LazyParserCombinator as follows:

def rule(f):
  return LazyParserCombinator(f)

and then apply it to the functions that hold the definitions of each grammar rule:

@rule 
def num():    return lit("1")

@rule 
def factor(): return num | (lit("(") + expr + lit(")"))

@rule 
def term():   return factor + lit("*") + term | factor

@rule 
def expr():   return (term + lit("+") + expr) | term

The syntactic overhead within the right hand sides of the rules is minimal (no overhead for referencing other rules, no p(...)-wrappers or ruleName()-parentheses needed), and there is no counter-intuitive boilerplate with lambdas.


Explanation:

Given a higher order function h, we can use it to decorate other function f as follows:

@h
def f():
  <body>

What this does is essentially:

def f():
  <body>

f = h(f)

and h is not constrained to returning functions, it can also return other objects, like ParserCombinators above.