Treetop grammar issues using regular expressions

406 Views Asked by At

I have a simple grammar setup like so:

grammar Test
   rule line
      (adjective / not_adjective)* {
         def content
             elements.map{|e| e.content }
         end
      }
   end
   rule adjective
      ("good" / "bad" / "excellent") {
          def content
              [:adjective, text_value]
          end
      }
   end
   rule not_adjective
      !adjective {
          def content
              [:not_adjective, text_value]
          end
      }
   end
end

Let's say my input is "this is a good ball. let's use it". This gives an error, which I'm not mentioning right now because I want to understand the theory about why its wrong first. So, how do I create rule not_adjective so that it matches anything that is not matched by rule adjective? In general, how to I write I rule (specifically in Treetop) that "doesnt" match another named rule?

2

There are 2 best solutions below

0
On

Treetop is a parser generator that generates parsers out of a special class of grammars called Parsing Expression Grammars or PEG.
The operational interpretation of !expression is that it succeeds if expression fails and fails if expression succeeds but it consumes NO input.
To match anything that rule expression does not match use the dot operator (that matches anything) in conjunction with the negation operator to avoid certain "words":

( !expression . )* ie. "match anything BUT expression"
0
On

The previous answer is incorrect for the OP's question, since it will match any sequence of individual characters up to any adjective. So if you see the string xyzgood, it'll match xyz and a following rule will match the "good" part as an adjective. Likewise, the adjective rule of the OP will match the first three characters of "badge" as the adjective "bad", which isn't what they want.

Instead, the adjective rule should look something like this:

rule adjective
  a:("good" / "bad" / "excellent") ![a-z] {
    def content
      [:adjective, a.text_value]
    end
  }
end

and the not_adjective rule like this:

rule not_adjective
  !adjective w:([a-z]+) {
    def content
      [:not_adjective, w.text_value]
    end
  }
end

include handling for upper-case, hyphenation, apostrophes, etc, as necessary. You'll also need white-space handling, of course.