Given two standard definitions of binary trees and arbitrarily branching trees (K-trees) in Coq (nat hardcoded for simplicity):
Inductive bintree : Type :=
| Leaf : bintree
| Node : bintree -> nat -> bintree -> bintree.
Inductive tree :=
| TNode : nat -> list tree -> tree.
I'd like to define a function encode : tree -> bintree that converts a K-tree to a binary tree with the same nodes, with the ultimate goal being a definition of decode : bintree -> tree such that forall t, decode (encode t) = t and a proof of this and some other minor properties. A simple idea goes as follows: we encode a tree of node x and children ts as a binary tree where left subtree is encoded tree rooted at its leftmost child, and right subtree is encoded tree rooted at immediate neighbor of x to the right. In Haskell:
data Tree a = TNode a [Tree a]
data BinTree a = Leaf | BNode (BinTree a) a (BinTree a)
encode :: Tree a -> BinTree a
encode t = go [t]
where
go [] = Leaf
go ((TNode x ts) : ts') = BNode (go ts) x (go ts')
An analogous definition in Coq is ill-formed:
Fixpoint encode_list (l: list ntree) :=
match l with
| [] => Leaf
| (TNode x ts) :: ts' => Node (encode_list ts) x (encode_list ts')
end.
Error: Recursive definition of encode_list is ill-formed.
(* ... *)
Recursive call to encode_list has principal argument equal to "ts" instead of "ts'".
Recursive definition is:
"fun l : list tree =>
match l with
| [] => Leaf
| t :: ts' =>
match t with
| TNode x ts => Node (encode_list ts) x (encode_list ts')
end
end".
It's easy to tell that both ts and ts' are strict subterms of l, but Coq's heuristics are to weak to assert that. I tried to apply a similar trick to the one described here, but encoding a tree is done in the context of a node's right sibling, so it's difficult (impossible?) to write a function that given a node and its sibling(s) produces a binary tree but recurses structurally down the structure of only one of its arguments, i.e. this definition is invalid because we iterate down both arguments:
(* ns : right siblings of t *)
Fixpoint encode' (t: tree) (ns: list tree) :=
let '(TNode x cs) := t in
match cs, ns with
| [], [] => Node Leaf x Leaf
| [], n :: ns' => Node Leaf x (encode' n ns')
| c :: cs', [] => Node (encode' c cs') x Leaf
| c :: cs', n :: ns' => Node (encode' c cs') x (encode' n ns')
end.
(* Error: Cannot guess decreasing argument of fix. *)
(* Definition encode (t: tree) = encode' t []. *)
I eventually managed to get my first definition admitted via Program Fixpoint with measure being total number of nodes in a list of trees. But I wonder if this can be achieved without resorting to well-founded recursion? One would think that we should be able to apply the trick with nested fix function in this situation, but I haven't succeeded at doing so. Full snippet below:
Require Coq.Program.Wf.
Require Import Lia.
Require Import List.
Import ListNotations.
Set Implicit Arguments.
Inductive bintree (A: Type) : Type :=
| Leaf : bintree A
| Node : bintree A -> A -> bintree A -> bintree A.
Inductive tree (A: Type) :=
| TNode : A -> list (tree A) -> tree A.
Definition ntree := tree nat.
Arguments Leaf {A}.
Arguments TNode [A].
Fixpoint n_nodes (t: ntree) :=
let '(TNode n ns) := t in
let fix n_nodes_list (l: list ntree) :=
match l with
| [] => 0
| x :: xs => n_nodes x + n_nodes_list xs
end
in 1 + n_nodes_list ns.
Fixpoint sum_n_nodes (l: list ntree) :=
match l with
| [] => 0
| t :: ts => n_nodes t + sum_n_nodes ts
end.
Program Fixpoint encode_list (l: list ntree) {measure (sum_n_nodes l)} :=
match l with
| [] => Leaf
| (TNode x ts) :: ts' => Node (encode_list ts') x (encode_list ts)
end.
Next Obligation.
destruct ts'; simpl; lia.
Defined.
Next Obligation.
destruct ts.
simpl; lia.
simpl sum_n_nodes at 1.
replace (sum_n_nodes (TNode x (t :: ts) :: ts')) with (1 + n_nodes t + sum_n_nodes ts + sum_n_nodes ts') at 1 by reflexivity.
lia.
Defined.
Definition encode (l: ntree) := encode_list [l].
You can't define
encodeusing only a recursive function with a decreasing argument of typelist tree, because it cannot recurse on the head. Instead you must have a function with decreasing argument of typetree, which can call itself on the head of the list.When we say "recursive calls must be made on subterms of the decreasing argument", the exact definition of "subterm" is somewhat sophisticated and counterintuitive. It's not as naive as "t is a subterm of C t, where C is a constructor". There is this odd situation where
tis a subterm ofTNode n (t :: ts), but not oft :: ts(somehow becauselistdoes not occur in the first argument of the constructor_ :: _in the definition oflist). I faintly remember there is a good reason for that subtlety but I can't recall it off the top of my head.You must somehow turn the recursive call of
encode_liston (the contents of) the head of the list into a recursive call ofencode. You will also need a nested recursive function to loop through the list, soencode_listwill still remain.Unfold what
encodeshould be in the interesting case, a rose tree with a nonempty forestTNode n (t :: ts). The goal is to rewrite the right-hand side in terms ofencode t. (In general, you may have to find a more sophisticated auxiliary function, and get the function you want as a special case of it, but in this case it luckily works simply enough.)encode tisencode_list (t :: []), which looks similar but not identical toencode_list (t :: ts).where the underscores are some functions of
tthat I've omitted for brevity. We can rewriteencode_list (t :: ts)in terms ofencode tby replacing its right subtree withencode_list ts, using an auxiliaryappendfunction below:where
The
Leafcase ofappendis left unspecified because it will never happen in the execution ofencode, so we can fill it out arbitrarily:In Coq:
One technical aspect of that definition is that
encode_listis "nested" inencode(fixinside aFixpoint/fix), instead of defined mutually withencode(fix ... with ...). This is not obvious from the "Haskelly" presentation above and relies on more technical details on the definition of "subterm" that the Coq typechecker uses.Alternatively, you can avoid the useless and inelegant branch in
appendby defining the composition ofencodeandappendfirst, and obtainencodeas a special case. Indeed, in the previous definition ofencode, its recursive calls are all passed toappend, and we know the right subtree is always aLeaf.Full gist
Note that there is no
treethat can be encoded to aLeafsodecodemust be partial, or you'll have to reconsider the problem statement somehow.