Implementation differences between parser combinators and packrat algorithm

1k Views Asked by At

In order to have a better understanding of packrat I've tried to have a look at the provided implementation coming with the paper (I'm focusing on the bind):

instance Derivs d => Monad (Parser d) where 

  -- Sequencing combinator
  (Parser p1) >>= f = Parser parse

    where parse dvs = first (p1 dvs)

          first (Parsed val rem err) = 
            let Parser p2 = f val
            in second err (p2 rem)
          first (NoParse err) = NoParse err

          second err1 (Parsed val rem err) =
            Parsed val rem (joinErrors err1 err)
          second err1 (NoParse err) =
            NoParse (joinErrors err1 err)

-- Result-producing combinator
return x = Parser (\dvs -> Parsed x dvs (nullError dvs))

-- Failure combinator
fail [] = Parser (\dvs -> NoParse (nullError dvs))
fail msg = Parser (\dvs -> NoParse (msgError (dvPos dvs) msg))

For me it looks like (errors handling aside) to parser combinators (such as this simplified version of Parsec):

bind :: Parser a -> (a -> Parser b) -> Parser b
bind p f = Parser $ \s -> concatMap (\(a, s') -> parse (f a) s') $ parse p s

I'm quite confused because before that I thought that the big difference was that packrat was a parser generator with a memoization part. Sadly it seems that this concept is not used in this implementation.

What is the big difference between parser combinators and packrat at implementation level?

PS: I also have had a look at Papillon but it seems to be very different from the implementation coming with the paper.

2

There are 2 best solutions below

10
On BEST ANSWER

The point here is really that this Packrat parser combinator library is not a full implementation of the Packrat algorithm, but more like a set of definitions that can be reused between different packrat parsers.

The real trick of the packrat algorithm (namely the memoization of parse results) happens elsewhere. Look at the following code (taken from Ford's thesis):

data Derivs = Derivs {
                   dvAdditive :: Result Int,
                   dvMultitive :: Result Int,
                   dvPrimary :: Result Int,
                   dvDecimal :: Result Int,
                   dvChar :: Result Char}


pExpression :: Derivs -> Result ArithDerivs Int
Parser pExpression = (do char ’(’
                         l <- Parser dvExpression
                         char ’+’
                         r <- Parser dvExpression
                         char ’)’
                         return (l + r))
                     </> (do Parser dvDecimal)

Here, it's important to notice that the recursive call of the expression parser to itself is broken (in a kind of open-recursion fashion) by simply projecting the appropriate component of the Derivs structure.

This recursive knot is then tied in the "recursive tie-up function" (again taken from Ford's thesis):

parse :: String -> Derivs
parse s = d where
  d = Derivs add mult prim dec chr
  add = pAdditive d
  mult = pMultitive d
  prim = pPrimary d
  dec = pDecimal d
  chr = case s of
          (c:s’) -> Parsed c (parse s’)
          [] -> NoParse

These snippets are really where the packrat trick happens. It's important to understand that this trick cannot be implemented in a standard way in a traditional parser combinator library (at least in a pure programming language like Haskell), because it needs to know the recursive structure of the grammar. There are experimental approaches to parser combinator libraries that use a particular representation of the recursive structure of the grammar, and there it is possible to provide a standard implementation of Packrat. For example, my own grammar-combinators library (not maintained atm, sorry) offers an implementation of Packrat.

0
On

As stated elsewhere, packrat is not an alternative to combinators, but is an implementation option in those parsers. Pyparsing is a combinator that offers an optional packrat optimization by calling ParserElement.enablePackrat(). Its implementation is almost a drop-in replacement for pyparsing's _parse method (renamed to _parseNoCache), with a _parseCache method.

Pyparsing uses a fixed-length FIFO queue for its cache, since packrat cache entries get stale once the combinator fully matches the cached expression and moves on through the input stream. A custom cache size can be passed as an integer argument to enablePackrat(), or if None is passed, the cache is unbounded. A packrat cache with the default value of 128 was only 2% less efficient than an unbounded cache against the supplied Verilog parser, with significant savings in memory.