What is in a Python closure and what are the caveats for people used to OCaml?

Question

What is in a Python closure and what are the caveats for people used to OCaml?

330 Views Asked by ysalmon At 04 August 2019 at 12:38

This is a sort of a follow-up on an old answer to a question about the necessity of functools.partial : while that answer very clearly explains the phenomenon and the basic reason for it, there are still some unclear points to me.

To recap, the following Python code

myfuns = [lambda arg: str(arg) + str(clo) for clo in range(4)]
try :
    clo
except NameError :
    print("there is no clo")
for arg in range(4) :
    print(myfuns[arg](arg), end=", ")

gives 03, 13, 23, 33, , while the similar OCaml code

let myfuns = Array.map (fun clo -> fun arg -> (string_of_int arg) ^ (string_of_int clo)) [|0;1;2;3|];;
(* there is obviously no clo variable here *)
for arg = 0 to 3 do
  print_string (myfuns.(arg) arg); print_string ", "
done;;

gives 00, 11, 22, 33, .

I understand this is related to a different notion of closure applied to lambda arg: str(arg) + str(clo) and its correspondent fun arg -> (string_of_int arg) ^ (string_of_int clo).

In OCaml, the closure maps the identifier clo to the value of the variable clo in the outer scope at the time of creation of the closure. In Python, the closure somehow contains the variable clo per se, which explains that it gets affected by the incrementation caused by the for generator.

Is this correct ?

How is this done ? The clo variable does not exist in the global scope, as evidenced by my try/except construct. Generally, I would assume that the variable of a generator is local to it and so does not survive it. So, again, where is clo ? This answer gives insight about __closure__ but I still do not completely grasp how it manages to refer to the clo variable per se during the generation.

Also, beside this strange behaviour (for people used to statically binding languages), are there other caveats one should be aware of ?

Original Q&A

There are 3 best solutions below

Goswin von Brederlow On 05 August 2019 at 13:05

The difference is that python has variables and ocaml has bindings and currying.

Python:
myfuns = [lambda arg: str(arg) + str(clo) for clo in range(4)]

The for loop creates a variable clo and assigns the values 0, 1, 2, 3 to it for each iteration. The lambda binds the variable so it can later call str(clo). But since the loop last assigned 3 to clo all the lambdas append the same string.

Ocaml:
let myfuns = Array.map (fun clo -> fun arg -> (string_of_int arg) ^ (string_of_int clo)) [|0;1;2;3|];;

Here you call Array.map with the array [|0;1;2;3|]. This will evaluate the fun clo -> ... binding clo to each value in the array in turn. Each time the binding will be different so the string_of_int clo turns out different too.

While not the only difference this partial evaluation saves the day in python too. If you write your code like this:

Python:
def make_lambda(clo):
    return lambda arg: str(arg) + str(clo)
myfuns = [make_lambda(clo) for clo in range(4)]

The evaluation of make_lambda causes the clo in the lambda to be bound to the value of the make_lambda argument, not the variable in the for loop.

Another fix is binding the value in the lambda explicitly:

myfuns = [lambda arg, clo=clo: str(arg) + str(clo) for clo in range(4)]

Andreas Rossberg On 14 August 2019 at 08:56

You already have a couple of excellent answers, but to focus on the essence, the difference is due to two design choices Python made:

All variable bindings are mutable, and captured as such in closures.
for comprehensions do not bind a different variable for every iteration, but reassign a new value to the same one.

Neither design choice is necessary, in particular not the latter. For example, in OCaml, the variable of a for-loop is not mutable, but a fresh binding for each iteration. Even more interesting, in JavaScript, for (let x of ...) ... will make x mutable (unless you use const instead), but it still is separate for every iteration. That fixes the behaviour of JavaScript's older for (var x in ...), which has the same problem as Python and is notorious for leading to subtle bugs with closures.

**ivg** · Accepted Answer · 2019-08-05T14:01:25.410000

When Python creates a closure is collects all free variables into a tuple of cells. Since each cell is mutable, and Python passes a reference to the cell into the closure, you will see the last value of the induction variable in your loop. Let's look underneath the hood, here is our function, with i occurring free in our lambda expression,

def make_closures():
    return [lambda x: str(x) + str(i) for i in range(4)]

and here is the disassembly of this function

  2           0 BUILD_LIST               0
              3 LOAD_GLOBAL              0 (range)
              6 LOAD_CONST               1 (4)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                21 (to 37)
             16 STORE_DEREF              0 (i)
             19 LOAD_CLOSURE             0 (i)
             22 BUILD_TUPLE              1
             25 LOAD_CONST               2 (<code object <lambda>)
             28 MAKE_CLOSURE             0
             31 LIST_APPEND              2
             34 JUMP_ABSOLUTE           13
        >>   37 RETURN_VALUE

We can see that STORE_DEREF on 16 takes a normal integer value from the top of the stack (TOS) and stores it with the STORE_DEREF in a cell. The next three commands prepare the closure structure on the stack, and finally MAKE_CLOSURE packs everything into the closure, which is represented as a tuple (in our case 1-tuple) of cells,

 >>> fs = make_closures()
 >>> fs[0].__closure__
 (<cell at 0x7ff688624f30: int object at 0xf72128>,)

so it is a tuple with a cell containing an int,

 >>> fs[0].__closure__[0]
 <cell at 0x7ff688624f30: int object at 0xf72128>

 >>> type(fs[0].__closure__[0])
 cell

The crucial to the understanding point here, is that free variables are shared by all closures,

>>> fs[0].__closure__
(<cell at 0x7f1d63f08b40: int object at 0xf16128>,)

>>> fs[1].__closure__
(<cell at 0x7f1d63f08b40: int object at 0xf16128>,)

As each cell is a reference to a local variable in the enclosing function scope, indeed, we can find the i variable in the make_closures function, in the cellvars attribute,

>>> make_closures.func_code.co_cellvars
('i',)

Therefore, we have a little bit^? surprising effect of an integer value being passed by reference and becoming mutable. The main surprise in Python is the way how variables are packed and that the for loop is not having its own scope.

To be fair, you can achieve the same result in OCaml if you manually create a reference and capture it in a closure. e.g.,

let make_closures () =
  let arg = ref 0 in
  let fs = Array.init 4 (fun _ -> fun _ -> assert false) in
  for i = 0 to 3 do
    fs.(i) <- (fun x -> string_of_int x ^ string_of_int !arg);
    incr arg
  done;
  fs

so that

let fs = make_closures ()
fs.(1) 1;;
- : string = "14"

Historical References

Both OCaml and Python are influenced by Lisp and both imply the same technique for implementing closures. Surprisingly with different results, but not due to different interpretations of lexical scoping or closure environment but due to different object(data) models of the two languages.

The OCaml data model is not only simpler to understand but is also well defined by the rigorous type system. Python, due to its dynamic structure, leaves a lot of freedom in the interpretation of objects and their representation. Therefore, in Python, they decided to make variables bound in the lexical context of a closure mutable by default (even if they are integers). See also the PEP-227 for more context.

What is in a Python closure and what are the caveats for people used to OCaml?

There are 3 best solutions below

Historical References

Related Questions in PYTHON

Related Questions in CLOSURES

Related Questions in OCAML

Related Questions in LEXICAL-CLOSURES

Trending Questions

Popular # Hahtags

Popular Questions