Why aren't there function headers in OCaml?

1k Views Asked by At

In certain programming languages, most notably in C, there are header files with function declarations. These function "headers" come before the code, and are required for situations with mutual recursion. When the function headers are placed in a header file, they also help with linking situations where multiple C files are compiled together.

My understanding of function headers within a C file is that they help with compilation because they define the typing of the function if it is called before it is defined. If this is wrong, I would be happy to stand corrected and better informed, but this is my understanding of it.

So what I don't understand, is why other languages - in this case I'm singling out OCaml - don't have function headers. In OCaml, there modules with signatures, but a signature does not allow your instantiations of functions to be mutually recursive, even though the typings are given. In order to have mutual recursion in OCaml, you need to use either the "and" keyword, or defined one function locally within the other function (to the best of my knowledge).

(* version 1 *)
let rec firstfun = function
  | 0 -> 1
  | x -> secondfun (x - 1)
and secondfun = function
  | 0 -> 1
  | x -> firstfun x

(* version 2 *)
let rec firstfun = function
  | 0 -> 1
  | x -> 
       let rec secondfun = function
         |0 -> 1
         | x -> firstfun x in
       secondfun (x - 1)

So why is this the case? Is it related to polymorphism? Is there a compilation bottleneck I'm not considering?

3

There are 3 best solutions below

0
On BEST ANSWER

In fact, this is not about headers, as they are just pieces of code, that are copy-pasted into files using the preprocessor. The root of your question is why we need a full definition of an external value, in order to compile a unit. For example, in C, you can freely use any external function, given that you provided the prototype, and in older versions of C you don't even need the prototype.

The reason why it works is because a compilation unit, produced by a C compiler, will contain unresolved symbols (e.g., functions or data values defined somewhere else), and it is the job of a linker to connect those symbols to their respective implementations.

OCaml, underneath the hood, uses a C-like format for compilation units and is actually relying on a system C linker to produce an executable. Thus there are no practical or theoretical limitations so far.

So what is the problem? As long, as you can provide a type of an external symbol, you can easily use it. And here is the main problem -- if there is a mutual dependency (on a type level) between two compilation units (i.e., files), the OCaml typechecker can't infer the type. This limitation is mostly technical, as the typechecker works on a compilation unit level. The reason for this limitation is mostly historical, as at the time when OCaml compiler was written the separate compilation was trendy, and thus a decision was made to apply the typechecker on a per file basis.

To summarize, if you're able to provide types that OCaml type checker will accept, then you can easily induce a mutual recursion between your compilation units. To move the typechecker from your way, you can use different mechanisms, e.g., the Obj module, or external definitions. Of course, in that case, you're taking the whole responsibility on the soundness of your type definitions. Basically, the same as in C language.

Since you can always produce a whole-program compilation system from a compiler, that supports the separate compilation, by just concatenating all modules in one big module, the alternative solution would be to use some sort of a preprocessor that will collect all the definitions, and copy them into a single file (i.e., one compilation unit), and then apply the compiler to it.

3
On

Ocaml has module interface files .mli which play a role equivalent of header files .h in C and C++. Read the §2.5 Modules and Separate Compilation chapter of the Ocaml documentation. Actually what is important is the .cmi compiled form of these module interface files. (In C with GCC precompiled headers are very limited).

BTW, C++ header files are a nightmare. Most standard C++ headers (e.g. <map>) are actually pulling many thousands of lines. On my GCC6 Linux system, a simple file containing just #include <vector> is expanded to more than ten thousand lines (and this is one of the reason why compiling C++ code can be quite slow).

The C++ standardization team has discussed the incorporation of modules in the standard, but so far this has not been accepted. BTW, this suggests that modules are useful and cleaner than header files. Read Clang's Modules documentation.

Most recent programming language have some kind of compiled modules or packages (Ada, Go, D, ...)

C header files are a bit less messy (but C has no genericity) and practically smaller than C++ ones.

An Ocaml implementation file can (and often has) quite long let rec top-level definitions like

let rec 
 foo x y = 
 (* many lines *)
 .....
and 
 bar z t = 
 (* many lines *)
 ....
and 
 freezbee u v w = 
 (* many lines *)
 ....

(**** the above let rec definition can span many thousand lines *)

If you need to define mutually recursive functions across several Ocaml modules, you can use recursive modules as answered by kne or you could define references initialized to stub functions raising an exception:

let refoo = ref (fun (x : int) -> failwith "refun unitialized");;

in other parts of the same module, you would call !refoo y (instead of e.g. calling a foo y....) and you'll need some code which would fill refoo with a meaningful function. IIRC this trick is used a few times in the Ocaml compiler itself.

1
On

First, you seem to suggest that the main purpose of C header files is mutual recursion. I do not agree with that. Rather I see C header files as an attempt at interface/implementation separation. In that sense, OCaml does not lack anything when compared to C.

Now for mutual recursion itself: It is not supported across different compilation units. (It is supported between different modules in the same compilation unit.) But it can be somehow emulated using higher-order functions. Before I come to that, your example is not correct OCaml syntax. A corrected version is:

(* version 1 *)
let rec firstfun = function 0 -> 1
  | x -> secondfun (x - 1)
and secondfun = function 0 -> 1
  | x -> firstfun x

Now, how to spread that into different modules that may as well be in different compilation units (with the side-condition, that in this case First_module must be compiled first):

module First_module = struct
  let rec firstfun secondfun = function 0 -> 1
    | x -> secondfun (x - 1)
end

module Second_module = struct
  let rec secondfun = function 0 -> 1
    | x -> First_module.firstfun secondfun x
end

Observe that the type of First_module.firstfun is not int -> int as in your example, it is (int -> int) -> int -> int. That is, the function takes an additional argument, namely the yet-to-be-defined secondfun. So we have replaced linking (which happens at compile time) by argument passing (which happens at runtime (Although a sufficiently optimizing compiler might replace that by linking again.)).

(Side note: The rec in the definition of firstfun is not needed any more. I left it there for similarity with your code.)

EDIT:

In a comment you extend your question:

That being said, I am wondering why OCaml requires let-and notation within a module, as opposed to having predeclared function headers at the top of the module definition.

Usefulness aside, that actually can be emulated with a recursive module (Warning: As of 4.02 that feature is marked as experimental.) as follows:

module rec All : sig
  val firstfun : int -> int
  val secondfun : int -> int
end = struct

  let firstfun = function 0 -> 1
    | x -> All.secondfun (x - 1)

  let secondfun = function 0 -> 1
    | x -> All.firstfun x

end

include All

The overall answer to your question probably is: C and OCaml are different languages. They do things in different ways.