Statelessness implies referential transparency?

124 Views Asked by At

I have this code

data Slist a = Empty | Scons (Sexp a) (Slist a) 
data Sexp a = AnAtom a | AnSlist (Slist a)
data Fruit = Peach | Apple | Pear | Lemon | Fig deriving (Show,Eq)

sxOccurs oatm sxp =
  let slOC Empty = 0
      slOC (Scons se sls) = (seOC se) + (slOC sls)
      seOC (AnAtom atm) = if (atm == oatm) then 1 else 0
      seOC (AnSlist sla) = slOC sla
  in seOC sxp

As you can see in sxOccurs I've got two helper functions inside the let which are "mutually self-referential" as my The Little MLer calls them: slOC and seOC. So in SML you must use the keyword and to let them know about each other and "cross-reference" each other. BTW, sxOccurs counts the number of a specific AnAtom objects in an s-list, in my example atoms are Fruit variables.

My question is, is this an example of referential transparency? Similarly, in Davie he gives this example

let s0 = emptyStack
    s1 = push 12.2 s0
    s2 = push 7.1 s1
    s3 = push 6.7 s2
    s4 = divStack s3
    s5 = push 4.3 s4
    s6 = subtStack s5
    s7 = multStack s6
    s8 = push 2.2 s7
    s9 = addStack s8
in popStack s9

noting that a stack in Imperative-land is constantly mutating the stack, while Haskell is creating a new si variable for each stack operation. Then he says that each of these lines could be shuffled into a different order and the outcome would be unchanged. AFAICT this is the same basic idea as my sxOccurs when it doesn't care what order I present the sub-functions. So, again, is this the deeper meaning of referential transparency? If not, what is this I'm showing here?

2

There are 2 best solutions below

0
Daniel Wagner On BEST ANSWER

Referential transparency means this, and only this: you may replace a variable by its definition without changing the meaning of the program. This is called "referential transparency" because you can "look through" a reference to its definition.

For example, you write:

slOC Empty = 0
slOC (Scons se sls) = (seOC se) + (slOC sls)
seOC (AnAtom atm) = if (atm == oatm) then 1 else 0
seOC (AnSlist sla) = slOC sla

Here's a couple transformations you could make, thanks to referential transparency:

-- replace slOC by its definition
seOC (AnSlist sla) = (\v -> case v of Empty -> 0; SCons se sls -> seOC se + slOC sls) sla
-- replace slOC by its definition *again*, starting from the previous line
seOC (AnSlist sla) = (\v -> case v of
    Empty -> 0
    SCons se sls -> seOC se + (\v -> case v of
        Empty -> 0
        SCons se sls -> seOC se + slOC sls
        ) sls
    ) sla
-- replace slOC by its definition in another equation
slOC (Scons se sls) = seOC se + (\v -> case v of Empty -> 0; SCons se sls -> seOC se + slOC sls) sls
-- or, you could replace seOC by its definition instead
slOC (SCons se sls) = (\v -> case v of
    AnAtom atm -> if atm == oatm then 1 else 0
    AnSlist sla -> sLOC sla
    ) se + slOC sls
-- or you could do both, of course

Well, of course, right? By now you may be thinking, "But Daniel, how could this property ever fail?". I will turn briefly to another language to illustrated: C.

int *x = malloc(sizeof(*x));
x[0] = 42;
printf("%d\n", x[0]);

In case you don't read C well, this creates a new variable named x, allocates some space for it, writes 42 into that space, then prints out the value stored in that space. (We should probably expect it to print 42!) But I've defined x = malloc(sizeof(*x)) in the first line; can I replace x by this definition elsewhere?

No! This is a very different program:

int *x = malloc(sizeof(*x));
malloc(sizeof(*x))[0] = 42;
printf("%d\n", x[0]);

It's still a syntactically valid program, but now x[0] has not been initialized at the moment we reach the line that prints it -- because we allocated a second, independent chunk of space, and initialized that other space instead.

This turns out to be the main way that other languages violate referential transparency: when you have variables whose value can change, it is not safe to replace references to them by their defined value, either because it may have changed since then or because this will cause it not to change in the way the rest of the program expected it to. Haskell eschews this ability; variables, once assigned a value, are never modified.

0
Silvio Mayolo On

What you've described, as indicated in the comments already, is more accurately referred to as "mutual recursion", when two functions call each other in the course of their evaluation. Referential transparency, in effect, says that, given the exact same inputs, a function will produce the same outputs. This is not true in, say, Python, where we could write this function

global_var = 0

def my_function():
    return global_var

my_function() # 0
global_var = 100
my_function() # 100

We called my_function with the same inputs, but it mysteriously produced a different output. Now, of course, in this example, it's obvious why that is, but the idea behind referential transparency is that, in real-world code, it's not going to be that obvious. If the language you're working in doesn't have referential transparency and indeed if the language encourages action-at-a-distance style mutation, then you'll inevitably end up with functions that access mutable state you don't know about. A well-written function will include extensive documentation about these corner cases, but if you've ever worked on any mid-size or larger code base, you know "well-documented function" is a rare sight.

In Haskell, there's no way* to write a function like the above Python function. At worst, we can wrap it in IO

myFunction :: IORef Int -> IO Int
myFunction = readIORef

But now the type signature alone tells us, at a glance, "something fishy is going on here; buyer beware", and even then we can only access the one global variable the IORef gives us access to.

*There's no way to write the function in Haskell short of exploiting unsafePerformIO, behind which there be many dragons. With unsafePerformIO, we can pretty obviously break referential transparency, which is why it's a function called "unsafe" in a module called "unsafe" that every Haskell tutorial ever tells you to forget about and never use.