Compute infinite tree from rooted paths using delay modality

215 Views Asked by At

This is a variation on "Compute an (infinite) tree from fixpoint operator using delay modality".

The setting. We study a language of binary trees, augmented with the ability to refer to arbitrary other nodes in the tree by a path from the root:

type Path = [Bool]
data STree = SNode STree STree
           | SPath Path
           | SLeaf Int
    deriving (Show)

A path is defined in the context of some root, and identifies the subtree found by successively following left/right children when we see False/True in the path. For example, the path [False, True] identifies SLeaf 2 in the tree SNode (SNode (SLeaf 1) (SLeaf 2)) (SLeaf 3).

Such trees can be used, for example, to identify arbitrary flow graphs (including irreducible graphs, something that is not possible using fixpoint operators.)

We can consider the infinite tree induced by this description.

data Tree = Node Tree Tree
          | Leaf Int
    deriving (Show)

Here is a conversion function from one to the other, utilizing an auxiliary function follow that finds the subtree at some path of a tree:

follow :: Path -> Tree -> Tree
follow [] s = s
follow (False : p) (Node s _) = follow p s
follow (True  : p) (Node _ s) = follow p s
follow _ (Leaf _) = error "bad path"

flatten' :: Tree -> STree -> Tree
flatten' root (SPath p) = follow p root
flatten' root (SNode s1 s2) =
    Node (flatten' root s1) (flatten' root s2)
flatten' _ (SLeaf i) = Leaf i

flatten :: STree -> Tree
flatten s = fix (\root -> flatten' root s)

Unfortunately, this function is not productive: it infinite loops on flatten (SPath []).

The problem. We now consider a variant of this structure augmented with the delay modality (as described in "Compute an (infinite) tree from fixpoint operator using delay modality"), as well as a Loop constructor to indicate self-referential loops.

data Tree = Node (D Tree) (D Tree)
          | Leaf Int
          | Loop
    deriving (Show)

Without using non-structurally recursive function calls (so, you can structurally recurse on STree and Path), write a function STree -> Tree (or similar) which unfolds the infinite tree.

Postscript. In many cases, we don't want to compute an infinite unfolding: we want the finite unfolding if it exists, and an error otherwise. Specifically, given the original data type:

data Tree = Node Tree Tree
          | Leaf Int
    deriving (Show)

Without using non-structurally recursive, we might want to write a function STree -> Maybe Tree (or similar) which unfolds the syntax into a finite tree if it exists, and fails otherwise. The lack of a delay modality in this structure guarantees that it is finite.

As there are no instances of the delay modality in this structure, it seems impossible to do this with fixD: we will get a delayed result which we can never use. It would seem that we must convert the tree into a graph, topologically sort it, and then directly implement the algorithm ala Gaussian elimination without using fixD.

Is there an alternate typing discipline which allows us to implement the unfoldings as elegantly as in the original (wrong) code? If so, you may just have (re)discovered another algorithm for finding cycles in graphs.

1

There are 1 best solutions below

0
On

Well, here are at least some partial observations on the problem.

The particular formulation of the problem I wrote down is perhaps a bit too difficult; more difficult than was intended. Here is one particularly nasty example:

SNode (SVar [True, False]) (SVar [])

This is not a well-formed graph, but the cycle only becomes evident after unfolding the SVar [] occurrence. Replace the False with True and it becomes well-formed.

The intent was to be able to express irreducible graphs, and in fact, there is a simpler syntax based on letrec which can achieve this goal. We directly adopt the infinite binary tree representation from Oliveira-Cook's "Functional Programming with Structured Graphs" (https://www.cs.utexas.edu/~wcook/Drafts/2012/graphs.pdf) paper (without the PHOAS, and constructors renamed for consistency):

data STree
  = SVar Var
  | SMu [Var] [STree] -- first tree gets "returned"
  | SLeaf Int
  | SNode STree STree

SMu is a letrec-style binding operation: SMu ["x", "y"] [t1, t2] is essentially letrec x = t1; y = t2 in x. Instead of writing down the path to the node you want, you simply give it a name. Additionally, this makes this problem a lot more similar to the previous StackOverflow question. (Compute an (infinite) tree from fixpoint operator using delay modality) So, can we solve it the same way?

The answer is "Yes, however..." The caveat occurs in this expression: SMu ["x", "y"] [SVar "y", SLeaf 0]. This "recursive" binding is not recursive at all—but it will still be rejected by a delayed context style algorithm, as the variable y is not available at the time we want to use it (it is not guarded.) In fact, this precisely corresponds to restriction suggested in "Forbidding empty cycles": the inserted occurrences of f serves as a guardedness check to ensure cycles cannot occur.

Clearly, the intent is that all of the bindings in SMu be strongly connected; thus, our algorithm is only appropriate if we first compute strongly connected components of the bindings, preprocessing into truly recursive bindings (so we do not reject non-recursive bindings spuriously.) Non-recursive bindings can be processed in order without fixD. Indeed, this matches how bindings in, e.g., Haskell, are handled in practice: we first separate bindings into strongly connected components, and then process the SCCs one at a time, tying the knot in these cases (and erroring when we detect an empty cycle in the SCC.)

But that's not all. Consider SMu ["x" "y"] [SVar "y", SNode (SVar "x") (SVar "x")]. All the components are strongly connected, y is unguarded, and yet there are no unguarded loops. So it is not enough to just topsort: we have to delete bare variables as well. Fortunately, this is fairly trivial (replace all occurrences of "x" with "y", in this case.)

So, what does this mean about the original problem? I do not think it is completely solved: the rooted paths make it difficult to say what the "topologically sorted" version of the tree is supposed to be in the first place!