I'm new to lexing and parsing in entirety beyond small cases. With that caveat given, my problem is that I'm trying to parse a JSP like dialect in Scala. I am lexing the char stream and when I get to a JSP like tag, I'm stuck.
Some text<%tag attribute="value"%>more stuff.
My lexer right now is attempting to pull out the tag part and tokenize, so I have something like:
def document: Parser[Token] = tag | regular
def tag: Parser[Token] = elem('<') ~ elem('%') ~ rep1(validTagName) ~ tagAttribute.* ~ elem('%') ~ elem('>') ^^ {
case a ~ b ~ tagName ~ tagAttributes ~ c ~ d => {
Tag(tagName.foldLeft("")(_+_)) :: tagAttributes.flatMap(_)
}
}
def validTagName: Parser[Token] = elem("",Character.isLetter(_)) // over-simplified
... Other code for tagAttribute and Tag extends Token here
You can probably spot about a half a dozen problems right now, I know I can spot a few myself, but, this is where I'm currently at. Ultimately the token function is supposed to return a Parser, and if I understand this all correctly, a Parser can be comprised of other parsers. My thinking is that I should be able to construct a parser by combining several other Parser[Token]
objects. I don't know how to do that, and I don't understand fully if that is the best way to do this.
It sounds like you may be mixing up your lexical and synactic parsers. If you want to go the route of writing your own lexer, you'll need two parsers, with the first extending
lexical.Scanners
(and therefore providing atoken
method of typeParser[Token]
), and with the other extendingsyntactical.TokenParsers
and referring to the first in its implementation of that trait's abstractlexical
method.Unless you have some specific reason to write your own lexer, though, it may be easier to use something like
RegexParsers
:Now something like
MyParser.parseAll(MyParser.tag, "<%tag attribute=\"value\"%>")
works as expected.Note that since we're not writing a lexer, there's no obligation to provide a
Parser[Token]
method.