Understanding the attoparsec implementation (part 2)

284 Views Asked by At

I am currently trying to study and understand the source code of the attoparsec library, but there are some details I can't figure out myself. For example, the definition of the Parser type:

newtype Parser i a = Parser {
      runParser :: forall r.
                   State i -> Pos -> More
                -> Failure i (State i)   r
                -> Success i (State i) a r
                -> IResult i r
    }

newtype Pos = Pos { fromPos :: Int }
            deriving (Eq, Ord, Show, Num)

data IResult i r =
    Fail i [String] String
  | Partial (i -> IResult i r)
  | Done i r

type Failure i t   r = t -> Pos -> More -> [String] -> String
                       -> IResult i r
type Success i t a r = t -> Pos -> More -> a -> IResult i r

What I don't fully understand yet is the usage of the type-parameter r. What would be different if I defined the type signature of runParser like this:

State i -> Pos -> More -> Failure i (State i) a -> Success i (State i) a a -> IResult i a

?

Can you please help me to understand what forall r. exactly means in this case and why it is necessary to use it in the runParser's type signature?

Many thx in advance!

UPDATE: To clarify my question further: What I currently don't understand is why it is necessary to bring in the type-parameter r in the first place. One could imagine, that the Parser type could have been also defined like this:

newtype Parser i a = Parser {
      runParser ::
                   State i -> Pos -> More
                -> Failure i (State i) a
                -> Success i (State i) a
                -> IResult i a
}

data IResult i a =
    Fail i [String] String
    | Partial (i -> IResult i a)
    | Done i a

type Failure i t a  = t -> Pos -> More -> [String] -> String
                      -> IResult i a
type Success i t a = t -> Pos -> More -> a -> IResult i a

where the type-parameter r is not used at all. And my question is why this definition would be "wrong" and what problems it would entail...

2

There are 2 best solutions below

1
On BEST ANSWER

attoparsec creates continuation passing style (CPS) parsers and without the forall we wouldn't be able to chain parsers together.

Here is a dramatically simplified version of the the types involved and definition of bindP - the monadic bind operator. We have eliminated the failure continuation and the input source.

{-# LANGUAGE Rank2Types #-}

type IResult r = r
type Success a r = a -> IResult r  -- Success a r == a -> r

newtype Parser a = Parser {
      runParser :: forall r. Success a r
                -> IResult r
    }

bindP :: Parser a -> (a -> Parser b) -> Parser b
bindP m g =
    Parser $ \ks -> runParser m $ \a -> runParser (g a) ks
                                                  -----
                                                  -----

Notes that Success a r is simply the function type a -> r.

If we replace the definition of runParser with:

runParser :: Success a a -> IResult a

we'll get a type error at the above underlined location. To understand this one can work out the following types:

ks                                      :: Success b b
runParser m $ \a -> runParser (g a) ks  :: IResult b
\a -> runParser (g a) ks                :: Success b b  == b -> b
a :: b

but from the expression (g a) we can also conclude that a has type a which gives us our type error.

Basically Parser a can be thought of as a method (or computation) of generating value of type a and runParser p ks is the way to take that value and feed it to a function which takes an a. The continuation function ks can have type a -> r for any r - the only requirement is that its input type is a. By using Success a a in defining runParser we limit the applicability of runParser to functions of type a -> a. That's why we want to define runParser as:

runParser :: Parser a -> (a -> r) -> r

This CPS-style is a very different approach to parsing than what is presented in Monadic Parsing in Haskell

2
On

The forall r part means that this type works for all r as your result without it needing to be specified on the left hand side of the newtype declaration. As stated in the Haskell wikibook:

The forall keyword is used to explicitly bring type variables into scope.

The r type variable isn't in scope until forall is used. That wikibook page explains things pretty well with several examples, so I'll encourage you to look there, but the crux of the article is that forall acts like an intersection of types, so if you consider types as sets of elements, with Bool = {False, True, ⊥}, Int = {minBound..maxBound, ⊥}, Char = {minBound..maxBound, ⊥}, etc. (expand minBound..maxBound as appropriate, and means undefined, usually called bottom), then

forall a. a

is the intersection of all types, namely {⊥}, while

forall a. Show a => a

is the intersection of all types that are constrained by Show. In this context, when you have a type like

data T = forall a. MkT a

Then the constructor MkT has type forall a. a -> T, allowing you to convert any type into a T, this trick allows you to have heterogeneous lists:

[MkT (), MkT 1, MkT "hello"] :: [T]

But you can't do much with this type, how would you unwrap the MkT to do anything with it? You could pattern match on the constructor, but then you'd just have a value of type a with no information about it. If you had

data S = forall s. Show s => MkS s

Then you could do something like

-- Pass through instance of Show
instance Show S where show (MkS s) = show s

map show [MkS "hello", MkS 5, MkS (Just 2)]
["\"hello\"", "5", "Just 2"]

In the case of attoparsec, it is somewhat similar to the ST monad in how it uses the forall to prevent you from writing illegal operations. Again, that wikibooks article provides a pretty good explanation. If it isn't sufficient, comment and I'll see if I can clarify.