I need help with this program I am trying to write in Haskell. I have written most of it, and here is what I am basically trying to do:
- When I write
parse "a + b"
in the terminal I want this as output:
Plus (Word "a") (Word "b")
- When I write
parse "a - 2 * b + c"
in the terminal I want this as output:
Minus (Word "a") (Plus (Mult (Num 2) (Word "b")) (Word "c"))
My code so far:
data Ast
= Word String
| Num Int
| Mult Ast Ast
| Plus Ast Ast
| Minus Ast Ast
deriving (Eq, Show)
tokenize :: [Char] -> [String]
tokenize [] = []
tokenize (' ' : s) = tokenize s
tokenize ('+' : s) = "+" : tokenize s
tokenize ('*' : s) = "*" : tokenize s
tokenize (c : s)
| isDigit c =
let (cs, s') = collectWhile isDigit s
in (c : cs) : tokenize s'
| isAlpha c =
let (cs, s') = collectWhile isAlpha s
in (c : cs) : tokenize s'
| otherwise = error ("unexpected character " ++ show c)
collectWhile :: (Char -> Bool) -> String -> (String, String)
collectWhile p s = (takeWhile p s, dropWhile p s)
isDigit, isAlpha :: Char -> Bool
isDigit c = c `elem` ['0' .. '9']
isAlpha c = c `elem` ['a' .. 'z'] ++ ['A' .. 'Z']
parseU :: [String] -> (Ast, [String])
parseU ("+" : s0) =
let (e1, s1) = parseU s0
(e2, s2) = parseU s1
in (Plus e1 e2, s2)
parseU ("*" : s0) =
let (e1, s1) = parseU s0
(e2, s2) = parseU s1
in (Mult e1 e2, s2)
parseU (t : ts)
| isNumToken t = (Num (read t), ts)
| isWordToken t = (Word t, ts)
| otherwise = error ("unrecognized token " ++ show t)
parseU [] = error "unexpected end of input"
isNumToken, isWordToken :: String -> Bool
isNumToken xs = takeWhile isDigit xs == xs
isWordToken xs = takeWhile isAlpha xs == xs
parse :: String -> Ast
parse s =
case parseU (tokenize s) of
(e, []) -> e
(_, t : _) -> error ("unexpected token " ++ show t)
inn :: Ast -> String
inn (Plus x y) = innP x ++ " + " ++ innP y
inn (Mult x y) = innP x ++ " * " ++ innP y
inn ast = innP ast
innP :: Ast -> String
innP (Num n) = show n
innP (Plus x y) = "(" ++ innP x ++ " + " ++ innP y ++ ")"
innP (Mult x y) = "(" ++ innP x ++ " * " ++ innP y ++ ")"
innP (Word w) = w --
innfiks :: String -> String
innfiks s = inn (parse s)
Right now I get an error posting the text I wrote in the terminal, but when I write it like this:
parse "+ a b"
I get the correct output:
Plus (Word "a") (Word "b")
I know I have to change the code so it accepts what I send to the parse function on this form:
value operator value,
and not on this form:
operator value value
But im struggling to find out what and where I have to do this change.
To handle infix operators with precedence, one approach is to introduce a sequence of parsing functions corresponding to the precedence levels. So, if you have "factors" that can be multiplied to create "terms" which can be added or subtracted to create "expressions", you'll want to create parser functions for each of these levels. Parsing a "factor" (i.e., a word or number) is easy, since you've already written the code:
Parsing a term is trickier. You want to start by parsing a factor and then, optionally, a
*
followed by another factor, and then treat that as a term to be further optionally multiplied by another factor, and so on. The following is one way to do it:If you want, try writing a similar
parseExpr
to parse terms separated by+
characters (skipping subtraction for now), and test it on something like:For spoilers, here's a version that handles both
+
and-
, though note that your tokenizer doesn't yet handle subtraction correctly, so you'll have to fix that first.With that in place, you can point
parse
to the right parser:and your example should work:
Note that this is slightly different than the output you said you wanted, but this ordering is correct for left-associative operators (which is important for correctly handling
-
). That is, you want:to parse as:
so that an evaluator will calculate the correct answer of 2. If you parse it as:
your evaluator would calculate the wrong answer of 0.
However, if you really want to parse with right-associative operators, see below.
The full modified code for left-associative operators:
For right-associative operators, modify these definitions: