I'm trying to implement an interpreter for a programming language with lazy-binding in Haskell.
I'm using the tying-the-knot pattern to implement the evaluation of expressions. However I found it extremely hard to debug and to reason about. I spent at least 40 working on this. I learned a lot about laziness and tying-the-knot, but I haven't reached a solution yet and some behaviors still puzzle me.
Questions
Is there a sensible way to debug the knot and figure out what causes it bottom?
GHC stacktrace (printed when using profiling options) shows which function inside the knot triggers a loop. But that's not helpful: I need to understand what makes the knot strict in the knot's definition, and I couldn't find a way to show this.
It's been really hard to understand why the knot bottoms and I don't think it will be much easier, the next times I have to debug something like this.
How should I tie the knot in a monadic context? I learned that a function like
traverse
is strict for most types and this causes the knot to bottom.The only solution I can think of, is to remove the knot. That would increase the problem's complexity (every value would need to be re-computed every time), although this issue can be resolved by caching the value in a
STRef
: that's exactly what I would do in a strict language. I would prefer to avoid this solution and take advantage of Haskell's laziness, otherwise what's the point of it?In the code I provide later in this post, why does
evalSt e1
terminate, whileevalSt e2
doesn't? I can't still understand what's the difference.
Language's AST
I tried to simplify my AST as much as possible, and this is the most minimal definition I could come up with:
data Expr = Int Int | Negate Expr | Id String | Obj (M.Map String Expr)
deriving (Eq, Ord, Show)
pprint :: Expr -> String
pprint e = case e of
Int i -> show i
Negate i -> "(-" ++ pprint i ++ ")"
Id i -> i
Obj obj -> "{" ++ intercalate ", "
[ k ++ ":" ++ pprint v | (k,v) <- M.toList obj ] ++ "}"
Example programs
Here are a couple of example expressions represented with the AST above:
-- expression: {a:{aa1:(-b), aa2:ab, ab:(-b)}, b:3}
-- should evalutae to: {a:{aa1:-3, aa2:-3, ab:-3 }, b:3}
e1 = Obj $ M.fromList [
("a", Obj $ M.fromList [
("aa1", Negate $ Id "b"),
("aa2", Id "ab"),
("ab", Negate $ Id "b")
]),
("b", Int 3)
]
-- expression: {a:{aa:(-ab), ab:b}, b:3}
-- should evaluate to: {a:{aa:-3, ab:3}, b:3}
e2 = Obj $ M.fromList [
("a", Obj $ M.fromList [
("aa", Negate $ Id "ab"),
("ab", Id "b")
]),
("b", Int 3)
]
Pure eval function
I have then defined a function to evaluate an expression. This is the most simple definition I could write:
type Scope = M.Map String Expr
eval :: Scope -> Expr -> Expr
eval scope expr = case expr of
Int i -> Int i
Id str -> case M.lookup str scope of
Just e -> e
Nothing -> error $ str ++ " not in scope"
Negate aE -> do
case (eval scope aE) of
Int i -> Int $ -i
_ -> error $ "Can only negate ints. Found: " ++ pprint aE
Obj kvMap -> Obj $
let resMap = fmap (eval (M.union resMap scope)) kvMap
in resMap
Tying the knot
The most interesting part in the eval
function is the tying the knot in the Obj kvMap
case:
let resMap = fmap (eval (M.union resMap scope)) kvMap
in resMap
The idea is that in order to compute the expressions in kvMap
, the identifiers need to be able to access both the values in scope
and the results of the expressions in kvMap
. The computed values are resMap
, and to compute them we use the scope resMap ⋃ scope
.
It works!
This eval
function works as expected:
GHCi> pprint $ eval M.empty e1
"{a:{aa1:-3, aa2:-3, ab:-3}, b:3}"
GHCi> pprint $ eval M.empty e2
"{a:{aa:-3, ab:3}, b:3}"
Monadic evaluation
The limitation of the eval
function above, is that it's pure. In some cases I need to evaluate expressions in a monadic context. For instance I may need IO
to offer non-pure functions to the guest language.
I've implemented dozens of versions of eval
(both monadic, using RecursiveDo
, and of various degrees of purity) in an attempt to understand the issues. I'm presenting the two most interesting ones:
Passing the scope through a State monad
evalSt' :: Expr -> State Scope Expr
evalSt' expr = do
scope <- get
case expr of
Int i -> pure $ Int i
Id str -> case M.lookup str scope of
Just e -> pure e
Nothing -> error $ str ++ " not in scope"
Negate aE -> do
a <- evalSt' aE
case a of
Int i -> pure $ Int $ -i
_ -> error $ "Can only negate ints. Found: " ++ pprint aE
Obj obj -> mdo
put $ M.union newScope scope
newScope <- traverse evalSt' obj
put scope
pure $ Obj newScope
evalSt scope expr = evalState (evalSt' expr) scope
This function is able to evaluate the program e1
, but it bottoms (never return) on e2
:
GHCi> pprint $ evalSt M.empty e1
"{a:{aa1:-3, aa2:-3, ab:-3}, b:3}"
GHCi> pprint $ evalSt M.empty e2
"{a:{aa:
I still don't understand how it can compute e1
, since it does contain Id
s: isn't that program strict on the scope and shouldn't it bottom evalSt
? Why it doesn't? And what's different in e2
to cause the function the function to terminate?
Evaluating in the IO monad
evalM :: Scope -> Expr -> IO Expr
evalM scope expr = case expr of
Int i -> pure $ Int i
Id str -> case M.lookup str scope of
Just e -> pure e
Nothing -> error $ str ++ " not in scope"
Negate aE -> do
a <- evalM scope aE
case a of
Int i -> pure $ Int $ -i
_ -> error $ "Can only negate ints. Found: " ++ pprint aE
Obj kvMap -> mdo
resMap <- traverse (evalM (M.union resMap scope)) kvMap
pure $ Obj resMap
This function always bottoms (never returns) on every program that uses at least one Id
node. Even just {a:1, b:a}
.
Your pure evaluation function relies on there being no evaluation order in the semantics of Haskell, so that thunks get forced only when needed. In contrast, most effects are fundamentally ordered, so there is an incompatibility there.
Some monads are lazier than the others, and for those you can get some result out of making your evaluation function monadic, as you've seen with
evalSt e1
. The two most common lazy monads areReader
and lazyState
(which is the one you get fromControl.Monad.State
, as opposed toControl.Monad.State.Strict
).But for other effects, such as
IO
, you must control evaluation order explicitly, and that means implementing the cache for lazy evaluation explicitly (viaSTRef
for example), instead of implicitly relying on Haskell's own runtime.To see what is going wrong, unfold
traverse evalSt' obj
whereobj
is{aa:(-ab), ab:b}
.e2
, that ends up looking at the value of the"aa"
field, which isx
in the code above.x
comes fromcase a of ...
, which needsa
.a
comes fromevalSt' (Id "ab")
, which needs the field"ab"
, which isy
(from the knot tying surrounding thetraverse evalSt' obj
we are looking at).y
comes fromcase M.lookup "b" scope2 of ...
, which needsscope2
.scope2
comes fromget
, which gets the output state from the action preceding it, which is evaluatingx
.x
(from step 2). Hence there is an infinite loop.This can be fixed by always restoring the state at the end of
evalSt'
(technically, you only need to do this forId
andNegate
, but might as well do it always):Or use
Reader
instead, which gives you the power to update state locally for subcomputations, which is exactly what you need here. You can uselocal
to surroundtraverse evalSt' obj
:I don't have a good answer to this. I'm not familiar with debugging tools in Haskell.
You cannot rely on stack traces because subexpressions may force each other in a rather chaotic order. And there is something interfering with print-debugging (
Debug.Trace
) that I don't understand. (I would addDebug.Trace.trace (pprint expr) $ do
at the beginning ofevalSt'
, but then the trace doesn't make sense to me because things that should be printed once are replicated many times.)