Is there some way to reduce the pain of range tracking?

437 Views Asked by At

There's currently a pull request by Jonathan S. to replace the implementation of Data.IntMap with one explained in this README based on ideas from a blog post by Edward Kmett.

The basic concept Jonathan S. developed from is that an IntMap is a binary tree that looks like this (I've made some slight changes to his development for the sake of consistency):

data IntMap0 a = Empty | NonEmpty (IntMapNE0 a) 
data IntMapNE0 a =
    Tip !Int a
  | Bin { lo :: !Int
        , hi :: !Int
        , left :: !(IntMapNE0 a)
        , right :: !(IntMapNE0 a) }

In this representation, each node has a field indicating the least and greatest key contained in the IntMapNE0. Using just a little bit fiddling allows this to be used as a PATRICIA trie. Jonathan noted that this structure has almost twice as much range information as it needs. Following a left or right spine will produce all the same lo or hi bounds. So he cut those out by only including the bound not determined by the ancestors:

data IntMap1 a = Empty | NonEmpty { topLo :: !Int, child :: !(IntMapNE1 a) }
data IntMapNE1 a =
    Tip a
  | IntMapNE1 { bound :: !Int
              , left :: !(IntMapNE1 a)
              , right :: !(IntMapNE1 a)

Now each node has either a left bound or a right bound, but not both. A right child will have only a left bound, while a left child will have only a right bound.

Jonathan makes one further change, moving the values out of the leaves and into the internal nodes, which places them exactly where they are determined. He also uses phantom types to help track left and right. The final type (for now, anyway) is

data L
data R
newtype IntMap a = IntMap (IntMap_ L a) deriving (Eq)
data IntMap_ t a = NonEmpty !Int a !(Node t a) | Empty deriving (Eq)
data Node t a = Bin !Int a !(Node L a) !(Node R a) | Tip deriving (Eq, Show)

Certain aspects of this new implementation are quite attractive. Most importantly, many of the most-used operations are substantially faster. Less importantly, but very nicely, the bit fiddling involved is much easier to understand.

However, there is one serious pain point: passing the missing range information down through the tree. This isn't so bad for lookups, insertions, etc., but gets pretty seriously hairy in the union and intersection code. Is there some abstraction that would allow this to be done automatically?

A couple extremely vague thoughts:

  1. Could the phantom types be used with a custom class to tie treatment directly to handedness?

  2. The "missing piece" nature is somewhat reminiscent of some zippery situations. Might there be a way to use ideas from that realm?


I've started thinking about using an intermediate type of some sort to provide a symmetrical "view" of the structure, but I'm a bit stuck. I can fairly easily convert back and forth between the basic structure and the fancy one, but that conversion is recursive. I need a way to convert only partially, but I don't know nearly enough about fancily built types to get it done.

1

There are 1 best solutions below

0
Joachim Breitner On

Is there some abstraction that would allow this to be done automatically?

You should be able to define a set of pattern synonyms that give you that. I’ll start from the second-to-last variant of your code, i.e.:

data IntMap1 a = Empty | NonEmpty { topLo :: !Int, child :: !(IntMapNE1 a) }
data IntMapNE1 a =
    Tip a
  | IntMapNE1 { bound :: !Int
              , left :: !(IntMapNE1 a)
              , right :: !(IntMapNE1 a)

We tuple such a value with the bound from the parent in an Either (indicating whether it is a low or a high bound).

viewLoHi (Left lo, IntMapNE1 hi left right)
    = Just (lo, hi, (Left lo, left), (Right hi, right)
viewLoHi (Right hi, IntMapNE1 lo left right)
    = Just (lo, hi, (Left lo, left), (Right hi, right)
viewLoHi _
    = Nothing

pattern Bin' lo hi left right <- (viewLoHi -> Just (lo, hi, left, right))

The top-level data type is different, so it needs its own pattern synonym

viewLoHi' (NonEmpty lo child) = viewLoHi (Left lo, child)
viewLoHi' Empty = Nothing

pattern NonEmpty' lo hi left right <- (viewLoHi' -> Just (lo, hi, left, right)

Using only NonEmpty' and Bin' as you traverse the tree, the bookkeeping should now be completely hidden. (Code not tested, so there will be typos here)