Insert a character into parser combinator character stream in Haskell

569 Views Asked by At

This question is related to both Parsec and uu-parsinglib. When we write parser combinators, they process characters streams from compiler. Is it somehow possible to parse a character and put it back (or return another character back) to the input stream?

I want for example to parse input "test + 5", parse the t, e, s, t and after recognition of test pattern, put for example v character back into the character stream, so while continuating the parsing process we are matching against v + 5

I do not want to use this in any particular case for now - I want to deeply learn the possibilities.

3

There are 3 best solutions below

0
On

This is easily done in uu-parsinglib using the pSwitch function. But the question is why you want to do so? Because the v is missing from the input? In that case uu-parsinglib will perform error correction automatically so you do not need something like this. Otherwise you can write

pSwitch :: (st1 -> (st2, st2 -> st1)) -> P st2 a -> P st1 a
pInsert_v = pSwitch (\st1 -> (prepend v st2, id) (pSucceed ())

It depends on your actual state type how the v is actually added, so you will have to define the function

prepend
yourself. I do not know e.g. how such an insertion would influence the current position in the file etc.

Doaitse Swierstra

0
On

I'm not sure if it's possible with these parsers directly, but in general you can accomplish it by combining parsers with some streaming that allows injecting leftovers.

For example, using attoparsec-conduit you can turn a parser into a conduit using

sinkParser :: (AttoparsecInput a, MonadThrow m)
           => Parser a b -> Consumer a m b

where Consumer is a special kind of conduit that doesn't produce any output, only receives input and returns a final value.

Since conduits support leftovers, you can create a helper method that converts a parser that optionally returns a value to be pushed into the stream into a conduit:

import Data.Attoparsec.Types
import Data.Conduit
import Data.Conduit.Attoparsec
import Data.Functor

reinject :: (AttoparsecInput a, MonadThrow m)
    => Parser a (Maybe a, b) -> Consumer a m b
reinject p = do
    (lo, r) <- sinkParser p
    maybe (return ()) leftover lo
    return r

Then you convert standard parsers to conduits using sinkParser and these special parsers using reinject, and then combine conduits instead of parsers.

0
On

I think the simplest way to archive this is to build a multi-layered parser. Think of a lexer + parser combination. This is a clean approach to this problem.

You have to separate the two kind of parsing. The search-and-replace parsing goes to the first parser and the build-the-AST parsing to the second. Or you can create an intermediate token representation.

import Text.Parsec
import Text.Parsec.String

parserLvl1 :: Parser String
parserLvl1 = many (try (string "test" >> return 'v') <|> anyChar)

parserLvl2 :: Parser Plus
parserLvl2 = do text1 <- many (noneOf "+")
                char '+'
                text2 <- many (noneOf "+")
                return $ Plus text1 text2

data Plus = Plus String String
  deriving Show

wholeParse :: String -> Either ParseError Plus
wholeParse source = do res1 <- parse parserLvl1 "lvl1" source
                       res2 <- parse parserLvl2 "lvl2" res1
                       return res2

Now you can parse your example. wholeParse "test+5" results in Right (Plus "v" "5").

Possible variations:

  • Create a class and an instance for combining wrapped parser stages. (Possibly carrying parser state.)
  • Create an intermediate representation, a stream of tokens