Efficient implementation of Catamorphisms in Scala

352 Views Asked by At

For a datatype representing the natural numbers:

sealed trait Nat
case object Z extends Nat
case class S(pred: Nat) extends Nat

In Scala, here is an elementary way of implementing the corresponding catamorphism:

  def cata[A](z: A)(l: Nat)(f: A => A): A = l match {
    case Z => z
    case S(xs) => f( cata(z)(xs)(f) )    
  } 

However, since the recursive call to cata isn't in tail position, this can easily trigger a stack overflow.

What are alternative implementation options that will avoid this? I'd rather not go down the route of F-algebras unless the interface ultimately presented by the code can look pretty much as simple as the above.

EDIT: Looks like this might be directly relevant: Is it possible to use continuations to make foldRight tail recursive?

2

There are 2 best solutions below

0
On BEST ANSWER

If you were implementing a catamorphism on lists, that would be what in Haskell we call a foldr. We know that foldr does not have a tail-recursive definition, but foldl does. So if you insist on a tail-recusive program, the right thing to do is reverse the list argument (tail-recursively, in linear time), then use a foldl in place of the foldr.

Your example uses the simpler data type of naturals (and a truly "efficient" implementation would use machine integers, but we'll agree to leave that aside). What is the reverse of one of your natural numbers? Just the number itself, because we can think of it as a list with no data in each node, so we can't tell the difference when it is reversed! And what's the equivalent of the foldl? It's the program (forgive the pseudocode)

  def cata(z, a, f) = {
    var x = a, y = z;
    while (x != Z) {
      y = f(y);
      x = pred(x)
    }
    return y
  }

Or as a Scala tail-recursion,

  def cata[A](z: A)(a: Nat)(f: A => A): A = a match {
    case Z => z
    case S(b) => cata( f(z) )(b)(f)    
  } 

Will that do?

0
On

Yes, this is exactly the motivating example in the paper Clowns to the left of me, jokers to the right (Dissecting Data Structures) (updated, better, but non-free version here http://dl.acm.org/citation.cfm?id=1328474).

The basic idea is that you want to turn your recursive function into a loop, so you need to figure out a data structure that keeps track of the state of the procedure, which is

  1. What you've calculuated so far
  2. What you have left to do.

The type of this state depends on the structure of the type you're doing the fold over, at any point in the fold you are at some node of the tree and you need to remember the tree structure of "the rest of the tree". The paper shows how you can calculate that state type mechanically. If you do this for Lists, you get that the state you need to keep track of is

  1. The operation run on all the previous values.
  2. The list of elements left to process.

Which is exactly what foldl keeps track of, so it's kind of a coincidence that foldl and foldr can be given the same type.