Nesting generator expression calling a dynamically referenced function

157 Views Asked by At

I'm seeing some really odd behavior that I am not sure how to explain, when dynamically nesting generator expressions in Python 3, when the generator expression references a function which is dynamically referenced.

Here is a very simplified case reproducing the problem:

double = lambda x: x * 2
triple = lambda x: x * 3
processors = [double, triple]

data = range(3)
for proc in processors:
    data = (proc(i) for i in data)

result = list(data)
print(result)
assert result == [0, 6, 12]

In this case, I expected each number to be multiplied by 6 (triple(double(x))) but in reality triple(triple(x)) is called. It's more or less clear to me that proc points to triple when the generator expression is run, regardless of what it pointed to when the generator expression was created.

So, (1) is this expected and can someone point to some relevant info in the Python docs or elsewhere explaining this?

and (2) Can you recommend another method of nesting generator expressions, where each level calls a dynamically provided callable?

EDIT: I am seeing it on Python 3.8.x, haven't tested with other versions

4

There are 4 best solutions below

0
On

I've found that using map does work, which is a good answer for #2 as far as I'm concerned:

double = lambda x: print(x) or x * 2
triple = lambda x: print(x) or x * 3
processors = [double, triple]

data = range(3)
for proc in processors:
    data = map(proc, data)

result = list(data)
print(result)
assert result == [0, 6, 12]

Still, I'd be happy to know about #1 - what is the reason for this behavior and is this documented somewhere?

0
On

This is a result of two things:

  • Generators are lazily evaluated, so the functions are only called when the generator is consumed,
  • Names are resolved at evaluation time, not when the generator is created.

So at the time you consume the generator with list(data), the name proc refers to the function triple, and both generators call the function bound by the name proc, so you get triple twice.

The reason map works is because it's a function, so when you pass proc as an argument, it receives the value of proc at the time map is called, which is in the loop while proc still can refer to the double function.

0
On

Yes, it's expected, and you got the reason right.

As generators are lazy, proc(i) gets evaluated only when requested. Which involves evaluating proc and i then. And when you finally do request, proc is already triple, so that's what gets used.

In this particular case, data = map(proc, data) does the job. It works because map captures and remembers the proc as it was when you called map.

You could do the same with a generator function. I tried with a generator expression like

data = (p(i) for p in [proc] for i in data)

but it failed with ValueError: generator already executing. This worked, though:

data = (lambda proc: (proc(i) for i in data))(proc)
0
On

This does not works because proc is reasigned to triple in the second iteration of the loop, which changes the proc reference of the first iteration. This is more explicit by un-rolling the loop and removing the last generator:

double = lambda x: x * 2
triple = lambda y: y * 3

data = range(3)
proc = double
data = (proc(i) for i in data)

# Let's change `proc` to be triple
proc = triple

result = list(data)
print(result)
# [0, 3, 6]

The data as been tripled, which means that by reassigning proc to be triple you changes the value behind the reference in the first generator.