I'm working with some hierarchical data, a contrived example here:
data Item a =
Leaf a
| ListItems [Item a]
| DoubleItem (Item a) (Item a)
| TripleItem (Item a) (Item a) (Item a)
I find myself often writing code like this:
removeDoubles :: Item a -> Item a
removeDoubles x@(Leaf _) = x
removeDoubles (ListItems xs) = ListItems $ removeDoubles <$> xs
removeDoubles (DoubleItem x y) = ListItems [removeDoubles x, removeDoubles y]
removeDoubles (TripleItem x y z) = TripleItem (removeDoubles x) (removeDoubles y) (removeDoubles z)
removeTriples :: Item a -> Item a
removeTriples x@(Leaf _) = x
removeTriples (ListItems xs) = ListItems $ removeTriples <$> xs
removeTriples (DoubleItem x y) = DoubleItem (removeTriples x) (removeTriples y)
removeTriples (TripleItem x y z) = ListItems [removeDoubles x, removeDoubles y, removeDoubles z]
There's obviously some overlap here between these two functions, and this approach is also not very robust if I start adding more constructors to Item. Is there some typeclass I should be using that would help reduce this repetition? I'm imaginging something like:
removeDoubles :: Item a -> Item a
removeDoubles (DoubleItem x y) = ListItems [removeDoubles x, removeDoubles y]
removeDoubles x = traverseHierarchy removeDoubles x
Functor, Traversable etcetera seem to be designed around changing the type of a, whereas I'm less interested in mapping a -> something and more interested in changing the shape of this hierarchy.
Thanks for any assistance!
I think you are looking for a catamorphism. With more formality and library tooling, the idea below can be generalized and also automated (more on that at the end of this answer). For your specific task, the code that could be automatically generated is similar to
Here
catais generalizing the idea of recursively folding over the structure, and has nothing to do specifically with modifying the hierarchy, for example by removing doubles. But you can use this general machinery to achieve both of your specific goals easily, and without repetition or explicit recursion:catais part of a larger family of concepts called "recursion schemes". I'm sure there are numerous articles attempting to explain these concepts, but An Introduction to Recursion Schemes is one I liked. The Haskell library implementing these concepts, including automatic generation of thecatafunction for your types, isrecursion-schemes. (Really it doesn't generatecata; it generates typeclass instances for your type that allow you to use its predefinedcatafunction, and many more)To use the
cataimplementation inrecursion-schemesinstead of a bespoke implementation for your type, we need several GHC extensions:makeBaseFunctoruses Template Haskell to define a new typeItemF, based onItembut more general, called its "base functor". Its constructors are similar, but have names suffixed withF, and the fields that (inItem) recursively contain anItemhave a more general type, whose details I won't go into.Finally we're ready to implement your
removeDoublesfunction with the minimum of detail: only describe the broad idea and let recursion-schemes do the rest of the work. Note that this is even less duplication than you hoped for in your question: the recursive calls toremoveDoublesare all made implicit, and we can just specify what to do with a single level of the hierarchy.This version of
cata, instead of taking four separate functions for the four constructors ofItem, takes one function, and passes that function anItemFargument, which can be pattern-matched similarly to anItem.We match the
DoubleItemFfirst, and do the "special" work there. Then in the catch-all clause afterwards, we simply useembedto convert theItemFargument back to anItemwithout further modification. In both clauses, thexandyarguments we received have already been recursively handled bycatabefore we see them, so there's no need to do anything to their internals.