I'm looking at this example from attoparsec docs:
simpleComment = string "<!--" *> manyTill anyChar (string "-->")
This will build a [Char] instead of a ByteString slice. That's not good with huge comments, right?
The other alternative, takeWhile:
takeWhile :: (Word8 -> Bool) -> Parser ByteString
cannot accept a parser (i.e. cannot match a ByteString, only a Word8).
Is there a way to parse chunk of ByteString with attoparsec that doesn't involve building a [Char] in the process?
You can use
scan:It would look something like this:
And then use
scan 0 dfato take bytes up to and including the final"-->". The state I'm using here tells how many characters of"-->"we've seen so far. Once we've seen them all we informscanthat it's time to stop. This is just to illustrate the idea; for efficiency you might want to use a more efficient data structure than association lists, move the*Enumcalls into the lookup table, and even consider writing the function directly.