Make a parser ignore all redundant whitespace

Question

Make a parser ignore all redundant whitespace

1.8k Views Asked by Guildenstern At 02 December 2013 at 22:20

Say I have a Parser p in Parsec and I want to specify that I want to ignore all superfluous/redundant white space in p. Let's for example say that I define a list as starting with "[", end with "]", and in the list are integers separated by white space. But I don't want any errors if there are white space in front of the "[", after the "]", in between the "[" and the first integer, and so on.

In my case, I want this to work for my parser for a toy programming language.

I will update with code if that is requested/necessary.

Original Q&A

There are 2 best solutions below

bheklilr On 02 December 2013 at 22:37

Just surround everything with space:

parseIntList :: Parsec String u [Int]
parseIntList = do
    spaces
    char '['
    spaces
    first <- many1 digit
    rest <- many $ try $ do
        spaces
        char ','
        spaces
        many1 digit
    spaces
    char ']'
    return $ map read $ first : rest

This is a very basic one, there are cases where it'll fail (such as an empty list) but it's a good start towards getting something to work.

@Joehillen's suggestion will also work, but it requires some more type magic to use the token features of Parsec. The definition of spaces matches 0 or more characters that satisfies Data.Char.isSpace, which is all the standard ASCII space characters.

**Jon Purdy** · Accepted Answer · 2013-12-03T02:24:51.077000

Use combinators to say what you mean:

import Control.Applicative
import Text.Parsec
import Text.Parsec.String

program :: Parser [[Int]]
program = spaces *> many1 term <* eof

term :: Parser [Int]
term = list

list :: Parser [Int]
list = between listBegin listEnd (number `sepBy` listSeparator)

listBegin, listEnd, listSeparator :: Parser Char
listBegin = lexeme (char '[')
listEnd = lexeme (char ']')
listSeparator = lexeme (char ',')

lexeme :: Parser a -> Parser a
lexeme parser = parser <* spaces

number :: Parser Int
number = lexeme $ do
  digits <- many1 digit
  return (read digits :: Int)

Try it out:

λ :l Parse.hs
Ok, modules loaded: Main.
λ parseTest program " [1, 2, 3] [4, 5, 6] "
[[1,2,3],[4,5,6]]

This lexeme combinator takes a parser and allows arbitrary whitespace after it. Then you only need to use lexeme around the primitive tokens in your language such as listSeparator and number.

Alternatively, you can parse the stream of characters into a stream of tokens, then parse the stream of tokens into a parse tree. That way, both the lexer and parser can be greatly simplified. It’s only worth doing for larger grammars, though, where maintainability is a concern; and you have to use some of the lower-level Parsec API such as tokenPrim.

Make a parser ignore all redundant whitespace

There are 2 best solutions below

Related Questions in PARSING

Related Questions in HASKELL

Related Questions in PARSEC

Trending Questions

Popular # Hahtags

Popular Questions