I'm looking at this example from attoparsec docs:
simpleComment = string "<!--" *> manyTill anyChar (string "-->")
This will build a [Char]
instead of a ByteString
slice. That's not good with huge comments, right?
The other alternative, takeWhile:
takeWhile :: (Word8 -> Bool) -> Parser ByteString
cannot accept a parser (i.e. cannot match a ByteString
, only a Word8
).
Is there a way to parse chunk of ByteString
with attoparsec that doesn't involve building a [Char]
in the process?
You can use
scan
:It would look something like this:
And then use
scan 0 dfa
to take bytes up to and including the final"-->"
. The state I'm using here tells how many characters of"-->"
we've seen so far. Once we've seen them all we informscan
that it's time to stop. This is just to illustrate the idea; for efficiency you might want to use a more efficient data structure than association lists, move the*Enum
calls into the lookup table, and even consider writing the function directly.