I am struggling with Parsec
to parse a small subset of the Google project wiki syntax, and convert it into HTML. My syntax is limited to text sequences and item lists. Here is an example of what I want to recognize:
Text that can contain any kind of characters,
except the string "\n *"
* list item 1
* list item 2
End of list
My code so far is:
import Text.Blaze.Html5 (Html, toHtml)
import qualified Text.Blaze.Html5 as H
import Text.ParserCombinators.Parsec hiding (spaces)
parseList :: Parser Html
parseList = do
items <- many1 parseItem
return $ H.ul $ sequence_ items
parseItem :: Parser Html
parseItem = do
string "\n *"
item <- manyTill anyChar $
(try $ lookAhead $ string "\n *") <|>
(try $ string "\n\n")
return $ H.li $ toHtml item
parseText :: Parser Html
parseText = do
text <- manyTill anyChar $
(try $ lookAhead $ string "\n *") <|>
(eof >> (string ""))
return $ toHtml text
parseAll :: Parser Html
parseAll = do
l <- many (parseUl <|> parseText)
return $ H.html $ sequence_ l
When applying parseAll
to any sequence of characters, I get the following error message: "*** Exception: Text.ParserCombinators.Parsec.Prim.many: combinator 'many' is applied to a parser that accepts an empty string.
I understand that it is because my parser parseText
can read empty strings, but I can't see any other way. How can I recognize text delimited by a string? ("\n *"
here).
I am also open to any remarks or suggestions concerning the way I am using Parsec. I can't help but see that my code is a bit ugly. Can I do all this in a simpler way? For example, there is code replication (which is kind of painful) because of the string "\n *"
, that is used to recognize the end of a text sequence, the beginning of a list item, AND the end of a list item...
I removed the HTML stuff because for whatever reason I couldn't get
blaze-html
to install on my machine. But in principle it should be essentially the same thing. This parses strings delimited by the string "\n *" and ended by the string "\n\n". I don't know if have a leading\n
is what you want but that is easy to fix.Also, I don't know if the empty string is valid. You should change
sepBy1
tosepBy
if it is.As for the error you were getting: you have
string ""
inside ofmany
. Not only does this give the error you got, it doesn't make any sense! The parserstring ""
will always succeed without consuming anything, since the empty string is a prefix of all strings and"" ++ x == x
. If you try to do this multiple times then you will never finish parsing.Besides all that, your
parseList
should parse your language. It essentially does the same thing thatsepBy
does. I just thinksepBy
is cleaner :)