haskell implementation questions

254 Views Asked by At

Haskell is a really great language. I love it. But as a C++ programmer and with some basic knowledge about the computer architecture, I really want to know the implementation details about Haskell.

I mean, for example, map function. I know the grammar, the result. However, I want to know how does this function really work in the RAM or so. Because C family language is very clear about the mapping between the grammar and the computer behaviors.

So does anybody have ideas about the computer behaviors behind functional programmings grammar? Or any books about this like “inside the C++ object model”?

3

There are 3 best solutions below

0
On

The basic idea to implement lazy functional languages is called Graph Reduction.

"The Implementation of Functional Programming Languages" is a detailed, if older book on the subject: http://research.microsoft.com/en-us/um/people/simonpj/papers/slpj-book-1987/

A more tutorial-like introduction: http://research.microsoft.com/en-us/um/people/simonpj/papers/pj-lester-book/

For more on how GHC is implemented, you might look at the "Spineless Tagless G-Machine" (STGM) papers, e.g.: http://www.dcc.fc.up.pt/~pbv/aulas/linguagens/peytonjones92implementing.pdf

0
On

First, a warning: One of the fundamental properties of Haskell is that the compiler is highly likely to perform very radical transformations of your code during compilation. So the code that actually executes may not resemble what you wrote very closely at all.

That caveat asside, to a first approximation you can expect every variable and every data field to be a pointer to a heap object. Some of these heap objects represent data (integers, bools, characters, list nodes, etc.), and some of them represent Haskell code that hasn't executed yet due to laziness. If you write a long, complex expression, each and every sub-expression becomes a heap object, with the top-level expressions pointing to the lower ones. So the entire expression graph of your program becomes an object graph on the heap.

(Graph /= tree. A tree has no "loops" in it. Haskell allows recursion, so Haskell expressions aren't necessarily expression trees.)

Big Haskell expressions become a whole bunch of heap objects. Complicated nested pattern matches get desugared into a series of single-layer ones in an optimal order. All the other syntax sugar of the language gets stripped away, until there are only 6 primitives left:

  1. Create variable
  2. Use variable
  3. Create function
  4. Call function
  5. Create data record
  6. Inspect data record

If you want to know how the actual bits and bytes work, that's down to the compiler. If you mean GHC, it works approximately like this:

  • Every heap object is either code or data.

  • If it's data, it contains a value constructor ID number and one pointer for each of that constructor's fields.

  • If it's code (i.e., some subexpression that hasn't executed yet), it contains one a pointer to the function's machine code block, and one pointer for each of the function's arguments (no currying).

  • When you try to do a conditional branch, I/O operation or seq on a heap object, the run-time jumps to some code pointed to by the heap object.

  • If the object represents data, it will point to some code that instantly returns.

  • If the object represents code, it will point to the code that actually implements the function. This code computes the function's answer, and overwrites the original heap object with a data object representing the answer.

  • Either way, when control returns to the caller, the heap object will definitely be a data object, and it can now inspect the contructor ID or whatever else it wanted to do.

0
On

There is a very nice book that goes into the detail of implementing a functional language using lazy graph reduction:

Implementing functional languages: a tutorial. Simon Peyton Jones and David Lester, 1992.

This book gives a practical approach to understanding implementations of non-strict functional languages using lazy graph reduction. The book is intended to be a source of practical labwork material, to help make functional-language implementations `come alive', by helping the reader to develop, modify and experiment with some non-trivial compilers.

The unusual aspect of the book is that it is meant to be executed as well as read. Rather than merely presenting an abstract description of each implementation technique, we present the code for a complete working prototype of each major method, and then work through a series of improvements to it. All of the code is available in machine-readable form.

Note that this book is different from The Implementation of Functional Programming Languages, Simon Peyton Jones, 1987 (which is also very interesting).