Slices with attoparsec

109 Views Asked by levant pied At 22 June 2025 at 10:51

I'm looking at this example from attoparsec docs:

simpleComment   = string "<!--" *> manyTill anyChar (string "-->")

This will build a [Char] instead of a ByteString slice. That's not good with huge comments, right?

The other alternative, takeWhile:

takeWhile :: (Word8 -> Bool) -> Parser ByteString

cannot accept a parser (i.e. cannot match a ByteString, only a Word8).

Is there a way to parse chunk of ByteString with attoparsec that doesn't involve building a [Char] in the process?

Original Q&A

There are 1 best solutions below

Daniel Wagner On 05 September 2020 at 00:00 BEST ANSWER

You can use scan:

scan :: s -> (s -> Word8 -> Maybe s) -> Parser ByteString
A stateful scanner. The predicate consumes and transforms a state argument, and each transformed state is passed to successive invocations of the predicate on each byte of the input until one returns Nothing or the input ends.

It would look something like this:

transitions :: [((Int, Char), Int)]
transitions = [((0, '-'), 1), ((1, '-'), 2), ((2, '-'), 2), ((2, '>'), 3)]

dfa :: Int -> Word8 -> Maybe Int
dfa 3 w = Nothing
dfa s w = lookup (s, toEnum (fromEnum w)) transitions <|> Just 0

And then use scan 0 dfa to take bytes up to and including the final "-->". The state I'm using here tells how many characters of "-->" we've seen so far. Once we've seen them all we inform scan that it's time to stop. This is just to illustrate the idea; for efficiency you might want to use a more efficient data structure than association lists, move the *Enum calls into the lookup table, and even consider writing the function directly.

Slices with attoparsec

There are 1 best solutions below

Related Questions in HASKELL

Related Questions in ATTOPARSEC

Trending Questions

Popular # Hahtags

Popular Questions