Consider the following toy program that computes all combinations of character substitutions in a word, of the kind often used in passwords.
import Data.Char (isLower, toUpper)
variants :: String -> [String]
variants "" = [""]
variants (c:s) = [c':s' | c' <- subst c, s' <- variants s]
where subst 'a' = "aA@"
subst 'e' = "eE3"
subst 'i' = "iI1"
subst 'l' = "lL1"
subst 'o' = "oO0"
subst 's' = "sS$5"
subst 'z' = "zZ2"
subst x | isLower x = [x, toUpper x]
subst x = [x]
main :: IO ()
main = putStrLn $ show $ length $ variants "redistributables"
I compile it with and without optimizations:
$ ghc -fforce-recomp -Wall Test.hs -o test0
[1 of 1] Compiling Main ( Test.hs, Test.o )
Linking test0 ...
$ ghc -fforce-recomp -O -Wall Test.hs -o test1
[1 of 1] Compiling Main ( Test.hs, Test.o )
Linking test1 ...
Now test0
and test1
produce the same output, but test1
uses much more memory and spends most of its time in garbage collection:
$ ./test0 +RTS -s 2>&1 | grep total
2 MB total memory in use (0 MB lost due to fragmentation)
Productivity 93.2% of total user, 93.3% of total elapsed
$ ./test1 +RTS -s 2>&1 | grep total
188 MB total memory in use (0 MB lost due to fragmentation)
Productivity 15.0% of total user, 15.0% of total elapsed
Why?
I’m using GHC 7.4.1; I should probably use a newer compiler, but this is what I have handy at the moment, and the fault probably lies with me anyway.
You want
to be compiled into an outer loop and an inner loop. But GHC sees that the inner loop does not depend in any way on the outer "loop counter". Therefore, the full laziness transformation lifts the inner loop out of the outer loop. One fairly effective trick is to hide the fact that the inner loop is independent. This is done by splitting the inner loop off into a separate function taking a dummy argument, and hiding the dumminess by marking the function as
NOINLINE
. Then you can call the function with the outer loop counter, and GHC will generally refrain from messing with you.