I really like the idea of working with catamorphisms/anamorphisms in a generic way, but it seems to me it has a significant performance drawback:
Suppose we want to work with a tree structure in the categorical way - to describe different folding using a generic catamorphism function:
newtype Fix f = Fix { unfix :: f (Fix f) }
data TreeT r = Leaf | Tree r r
instance Functor TreeT where
fmap f Leaf = Leaf
fmap f (Tree l r) = Tree (f l) (f r)
type Tree = Fix TreeT
catam :: (Functor f) => (f a -> a) -> (Fix f -> a)
catam f = f . fmap (catam f) . unfix
Now we can write functions like:
depth1 :: Tree -> Int
depth1 = catam g
where
g Leaf = 0
g (Tree l r) = max l r
Unfortunately, this approach has a significant drawback: During the computation, new instances of TreeT Int
are created at every level in fmap
just to be immediately consumed by g
. Compared to the classical definition
depth2 :: Tree -> Int
depth2 (Fix Leaf) = 0
depth2 (Fix (Tree l r)) = max (depth1 l) (depth1 r)
our depth1
will be always slower making unnecessary strain on the GC. One solution would be to use hylomorphisms and combine creation and folding trees together. But often we don't want to do that, we may want a tree to be created on one place and then passed somewhere else to be folded later. Or, to be folder several times with different catamorphisms.
Is there a way to make GHC optimize depth1
? Something like inlining catam g
and then fusing/deforesting g . fmap ...
inside?
I believe I found an answer. I remembered reading Why does GHC make fix so confounding? and that suggested me a solution.
The problem with the former definition of
catam
is that it is recursive, and so any attempt to INLINE it is ignored. Compiling the original version with-ddump-simpl -ddump-to-file
and reading the core:is clearly worse (constructor creation/elimination in
catam_$scatam
, more function calls) compared toBut if we define
catam
asthen it is no longer recursive, only
u
inside is. This way GHC inlinescatam
in the definition ofdepth1
and fusesfmap
withdepth1
'sg
- just what we want:which is now just the same as the dump of
depth2
.