Parsec start-of-row pattern?

268 Views Asked by At

I am trying to parse mediawiki text using Parsec. Some of the constructs in mediawiki markup can only occur at the start of rows (such as the header markup ==header level 2==). In regexp I would use an anchor (such as ^) to find the start of a line.

One attempt in GHCi is

Prelude Text.Parsec> parse (char '\n' *> string "==" *> many1 letter <* string "==") "" "\n==hej=="
Right "hej"

but this is not too good since it will fail on the first line of a file. I feel like this should be a solved problem...

What is the most idiomatic "Start of line" parsing in Parsec?

2

There are 2 best solutions below

0
On BEST ANSWER

You can use getPosition and sourceColumn in order to find out the column number that the parser is currently looking at. The column number will be 1 if the current position is at the start of a line (such as at the start of input or after a \n or \r character).

There isn't a built-in combinator for this, but you can easily make it:

import Text.Parsec
import Control.Monad (guard)

startOfLine :: Monad m => ParsecT s u m ()
startOfLine = do
    pos <- getPosition
    guard (sourceColumn pos == 1)

Now you can write your header parser as:

header = startOfLine *> string "==" *> many1 letter <* string "=="
1
On

Probably you can use many (char '\n') instead of just char '\n'. In parser combinators there's no sense of start of the line because they always run at the start of input. The only thing you can do is to check manually which symbols your input can start from. Using many (char '\n') ensures that there only zero or more empty lines before header == my header ==.