What happens if you compile a program that takes no input? (Haskell IO purity issues (again))

789 Views Asked by At

putStrLn when called with any arguments will always return a value of type IO (). I agree that's pure, I can handle that. But is it referentially transparent? I think so, because for any given input you could replace the function call with an IO () which would throw the correct string at stdout.

So I'm cool with putStrLn, but getLine when called with no arguments could return any number of things provided they are of type IO String. That is neither pure nor referentially transparent right?

Silly pedantic question and it's probably not going to change how I write my code, but I really want to nail this once and for all. (I understand that the IO monad will sequence things correctly, that's not my issue)

This raises another question for me. Is the compiler smart enough to recognise a program that takes no input? For example say I compile

main = putStrLn . show $ map (+1) [1..10]

Is GHC smart enough to reduce that program to the IO () that causes [2,3,4,5,6,7,8,9,10,11] to be printed out? Or does it still work it out and evaluate/execute everything at runtime? Same goes for arbitrary programs where Input is not required. Does GHC employ the fact that the entire program is referentially transparent and can simply be replaced with it's value?

6

There are 6 best solutions below

0
On BEST ANSWER

I think there are two questions here.

  1. Is IO referentially transparent
  2. Will GHC reduce arbitrary expressions at compile time

Looking at the type of IO, you can imagine that it emulates referential transparency by relying on mysterious value RealWorld that does not have a data constructor, and by making each statement depend on the last (in a single threaded world). In the case of an IO String, this is a newtype wrapper around RealWorld -> (RealWorld, String)... which is a function, not a value. Using IO without the Monad instance makes this particularly, and painfully, obvious.

Prelude GHC.Types> :info IO
newtype IO a
  = IO (GHC.Prim.State# GHC.Prim.RealWorld
        -> (# GHC.Prim.State# GHC.Prim.RealWorld, a #))

As for GHC's optimization, in this case it does not reduce the list to a string at compile time. The optimized code produced by GHC 7.2.1 lazily generates a list, maps (+1) over the results, converts the list to a string, and finally prints it to the console. Pretty much exactly as it reads in your example.

2
On

getLine :: IO String is pure; its value is the IO action which reads and returns* a string from the standard input. getLine is always equal to this value.

*I use the word "returns" here for the lack of a better word.

Wikipedia defines referential transparency as:

An expression is said to be referentially transparent if it can be replaced with its value without changing the behavior of a program (in other words, yielding a program that has the same effects and output on the same input).

So getLine is referentially transparent too. Though I can't think of a nice way to express its "value" in some other way for the purposes of "replacing the expression with its value".

Also, one should be a bit careful with statements like "putStrLn when called with any arguments will always return IO ()". IO () is a type, not a value. For every s :: String, putStrLn s is a value of type IO (), yes. But what this value is, depends on s, of course.

(Besides, if you exclude those unsafe things, everything is pure and referentially transparent, and in particular so is getLine.)

1
On

Regarding the second part of the question. There's something called supercompilation which would hopefully pick up on something like that. It's still an area of research.

2
On

Let me just answer the second part of the question (I have answered the first part in an earlier question). The compiler is free to do whatever it wants to the expression as long as it doesn't change the semantics of the program. So you must ask the question about a specific compiler for it to make sense. Does ghc? No, not the current version. Are there any compilers that do? Yes, there is.

0
On

Yes, those monadic functions are pure referentialy transparent since the substitution rule still applies to them.

In Haskell, the following two programs are equivalent

main = (puStrLn "17" >> puStrLn "17")
main = let x = putStrLn "17" in (x >> x)

In a "normal" language the second example would only print once, as a side-effect of evaluating x. The way the two programs are actually the same becomes a little clearer when you realize that a value of type IO() is not actually a side-effecting computation but is actually a description of such a computation that you can use as building block to build larger computations from.

0
On

I'm not sure if this will help (I apologise in advance if it only confuses more), but the way IO is made referentially transparent in Mercury is to explicitly pass a value of type io to all IO-performing code, which must also return a new value of type io.

The input io represents "the state of the world" just before the code is called. The entire world outside the program; disk contents, what's printed on the screen, what the user is about to type, what's about to be received from the network, everything.

The output io represents the state of the world just after the code is called. The difference between the input io and the output io contains the changes to the world that were made by that code (plus everything else that happened externally, in theory).

Mercury's mode system ensures that values of type io are unique; there's only ever one of them, so you can't pass the same io value to two different IO-performing procedures. You pass an io into a procedure, rendering it useless to you, then receive a new one back.

Of course the real state of the actual world isn't encoded into values of type io; in fact under the hood io is completely empty! There's no information being passed at all! But the io values represent the state of the world.

You can think of functions in the IO monad as doing the same. They take an extra implicit state-of-the-world argument, and return an extra implicit state-of-the-world value. The IO monad implementation handles passing this extra output on to the next function. This makes the IO monad very like the State monad; it's easy to see that get in that monad is pure, even though it appears to take no arguments.

In that understanding, main receives the initial state of the world before your program runs and transforms it into the state of the world after the program is run, by threading it through all the IO code in your program.

And because you can't get a state-of-the-world value yourself, you have no way of starting your own little IO chain in the middle of other code. This is what ensures purity, since in actual fact we can't have a brand new world with its own state spring out of nowhere.

So getLine :: IO String can be thought of as something like getLine :: World -> (World, String). It's pure, because all those different times it's called and returns different strings it received a different World each time.

Whether you think about values that are IO actions, or the state-of-the-world being passed around between functions, or any other mechanism, all these constructs are representational. Under the hood, all IO is implemented with impure code, because that's how the world works; when you write to a file, you have changed the state of the disk. But we can represent this at a higher level of abstraction, allowing you to think about it differently.

An analogy is that you can implement a map with search trees or hash tables or a zillion other ways. But having implemented it, when you're reasoning about code that uses the map, you don't think about left and right sub-trees or buckets and hashes, you think about the abstraction that is a map.

If we can represent IO in a way that maintains purity and referential transparency, then we can apply any reasoning that requires referential transparency to code using this representation. This allows all the mathematics that applies to such code to work (much of which is used in the implementation of advanced compilers for purity-enforced languages), even for programs that perform IO.


And a quick addendum about your second question. GHC could theoretically reduce that input program to just the output. I don't believe it tries terribly hard to do so though, because this is undecidable in general. Imagine a program that took no input but generated an infinite list and then printed its last 3 elements. Theoretically any program that is not dependent on its input can be reduced to its output, but in order to do that the compiler has to do something equivalent to executing the program at compile time. So to do this fully generally you'd have to be happy for your programs to sometimes go into infinite loops at compile time. And almost every program is dependent on its input, so there's not a lot to be gained by even trying to do this.

There is something to be gained by identifying parts of programs that aren't dependent on any input and replacing them with their result. This is called partial evaluation, and it's an active research topic, but it's also very hard and there's no one-size-fits-all solution. To do it, you have to be able to identify areas of the program that won't send the compiler into an infinite loop trying to figure out what they return, and you have to make make decisions about whether removing some code that takes a few seconds at run time is a good enough benefit if it means embedding the multi-hundred-megabyte data structure it returns in the program binary. And you have to do all this analysis without taking hours to compile moderately complex programs.