GOLD Parser Including Comments

728 Views Asked by At

I'm working on a project to convert one language to another and am using GOLD Parser. I need to be able to include comments into my conversion, as we do not want to lose those. The problem is that the CommentLine and CommentBlock are treated as noise and is captured and thrown away. Is there a simple way to turn this behavior off so that when a comment is read, it's sent through the rest of the tree so that I can treat it like any other statement?

If not, can somebody help me in converting the CommentLine into a Rule that when parsed will be treated like any other statement? I'm using the VBScript grammar from the GOLD Parser website:

! Special comment definition
Comment Line   =    ''

My only other option at this point is for when my engine reads a comment token, take the raw data and source line number, and throw that into a dictionary that I can then refer back to as other tokens are processed. This is doable, but can get messy.

1

There are 1 best solutions below

1
On

Since 5.0 Gold Parser has changed the way it treats it handles multiple groups which share an ending terminal. This results in the definition you are using to not work (I assume you just removed the Rem part so it would build the grammar?)

Since 5.0 there are two main changes :

  • Lexical Groups
  • Groups and terminal attributes can be changed

because of this newlines which are defined will be used, if a newline is necessary a newline will be declared automatically. (such as with the comment).

Comment+X will automatically be classified as "noise" and will be removed when parsed, in order to avoid comments being defined as noise you need to specifically tell it that it is something essential to the parser logic.

Also the code you were using only found the comment start but did nothing with it, in order to "capture" anything after the ' symbol was found we need to declare what we are looking for. You can accomplish this by something along the lines of :

! Special Whitespace definition ( All Whitespace's excluding new-lines )
{WS} = {Whitespace} - {CR} - {LF} 

! Special Comment Line definition ( All words,special White-spaces and defined symbols until a Line Break is found ) 
Comment Line = ''({Alphanumeric} | {WS} | [.,-+="] )*{All Newline}  
Rem Line = rem

Comment Line @= {type= Content}
Rem @= {type = Content }

With this both are declared as two line-based groups (Comment line and Rem line) we define both to be of type Content resulting in both being treated as Content instead of the defaulted noise. (and should therefore not be removed by the parser).

Hope this helped, for further reading :

http://goldparser.org/doc/grammars/define-groups.htm

http://goldparser.org/doc/grammars/group-attributes.htm

http://goldparser.org/doc/grammars/example-group.htm