Parsec parsing and separating different structures

732 Views Asked by At

Let's say I have different parsers p1, ..., pk. I want to define a function pk :: Parser ([t1], ..., [tk]) where pi :: Parser ti.

That will parse a collection of strings (one after the other) that match any of p1...pk and separate them in the corresponding lists. Assume for simplicity that none of the strings matches two parsers.

I managed to do it, but I am really struggling to find an elegant way (preferably using many or any other builtin parsec parser).

3

There are 3 best solutions below

1
On BEST ANSWER

The first step is to turn each of your parsers into parser of the big type:

p1 :: Parser t1
p2 :: Parser t2
p3 :: Parser t3
p1 = undefined
p2 = undefined
p3 = undefined

p1', p2', p3' :: Parser ([t1], [t2], [t3])
p1' = fmap (\x -> ([x], [], [])) p1
p2' = fmap (\x -> ([], [x], [])) p2
p3' = fmap (\x -> ([], [], [x])) p3

Now, we repeatedly choose from these last parsers, and concatenate the results at the end:

parser :: Parser ([t1], [t2], [t3])
parser = fmap mconcat . many . choice $ [p1', p2', p3']

There are Monoid instances for tuples up to size five; beyond that, you can either use nested tuples or a more appropriate data structure.

4
On

Representing the parsers as a list makes this easy. Using:

choice :: [Parser a] -> Parser a
many :: Parser a -> Parser [a]

We can write:

combineParsers :: [Parser a] -> Parser [a]
combineParsers = many . choice

This isn't quite right, as it bundles them all into a single list. Let's associate each parser with a unique identifier:

combineParsers' :: [(k, Parser a)] -> Parser [(k, a)]
combineParsers' = many . choice . combine
  where combine = map (\(k,p) -> (,) k <$> p)

We can then convert this back to the list form:

combineParsers :: [Parser a] -> Parser [[a]]
combineParsers ps = map snd . sortBy fst <$> combineParsers' (zip [0..] ps)

You could perhaps make this more efficient by writing combineParsers' :: [(k, Parser a)] -> Parser (Map k [a]) instead, using e.g. combine = map $ \(k,p) -> fmap (\a -> Map.insertWith (++) k [a]) p.

This requires that all the parsers have the same result type, so you'll need to wrap each of their results in a data-type with Cons <$> p for each p. You can then unwrap the constructor from each list. Admittedly, this is fairly ugly, but to do it heterogeneously with tuples would require even uglier type-class hacks.

There might be a simpler solution for your specific use-case, but this is the easiest way to do it generically that I can think of.

4
On

You can't do this completely generically without type-level trickery, unfortunately. However, an approach along the lines of Daniel Wagner's suggestion can be constructed in DIY style, using polymorphic combinators and some pre-existing recursive instances. I'll present a simple example to demonstrate.

First, a few simpleminded parsers to test with:

type Parser = Parsec String ()

parseNumber :: Parser Int
parseNumber = read <$> many1 digit

parseBool :: Parser Bool
parseBool = string "yes" *> return True
        <|> string "no" *> return False

parseName :: Parser String
parseName = many1 letter

Next we create a "base case" type to mark the end of the possibilities, and give it a parser that always succeeds on no input.

data Nil = Nil deriving (Eq, Ord, Read, Show, Enum, Bounded)
instance Monoid Nil where 
    mempty = Nil
    mappend _ _ = Nil

parseNil :: Parser Nil
parseNil = return Nil

This is equivalent to (), of course--I'm only creating a new type to disambiguate in case we actually want to parse (). Next, the combinator that chains parsers together:

infixr 3 ?|>
(?|>) :: (Monoid b) => Parser a -> Parser b -> Parser ([a], b)
p1 ?|> p2 = ((,) <$> ((:[]) <$> p1) <*> pure mempty)
        <|> ((,) <$> pure [] <*> p2)

What's going on here is that p1 ?|> p2 tries p1, and if it succeeds wraps that in a singleton list and fills in mempty for the second element of the tuple. If p1 fails, if fills in an empty list and uses p2 for the second element.

parseOne :: Parser ([Int], ([Bool], ([String], Nil)))
parseOne = parseNumber ?|> parseBool ?|> parseName ?|> parseNil

Combining parsers with the new combinator is simple, and the result type is pretty self-explanatory.

parseMulti :: Parser ([Int], ([Bool], ([String], Nil)))
parseMulti = mconcat <$> many1 (parseOne <* newline)

We're relying on the recursive Monoid instance for tuples and the usual instance for lists to combine multiple lines. Now, to show that it works, define a quick test:

runTest = parseTest parseMulti testInput

testInput = unlines [ "yes", "123", "acdf", "8", "no", "qq" ]

Which parses successfully as Right ([123,8],([True,False],(["acdf","qq"],Nil))).