how to skip all single- and multi-line comments in a Parse::RecDescent parser

1.4k Views Asked by At

In Parse::RecDescent, how do I effectively ignore C++/Java style comments? This includes single-line ('//' until the end of the line) and multi-line (/everything between here/).

2

There are 2 best solutions below

2
On BEST ANSWER

<skip> defines what the parser considers whitespace.

parse: <skip: qr{(?xs:
          (?: \s+                       # Whitespace
          |   /[*] (?:(?![*]/).)* [*]/  # Inline comment
          |   // [^\n]* \n?             # End of line comment
          )
       )*}>
       main_rule
       /\Z/
       { $item[2] }

Unlike Nate Glenn's solution, mine

  • Doesn't set a global variable affecting all parsers.
  • Doesn't use needless captures.
  • Doesn't use the non-greedy modifier. (He used the non-greedy modifier to make sure certain characters aren't matched at certain spots, but the non-greedy modifier doesn't guarantee that.)

Note: (?:(?!STRING).)* is to (?:STRING) as [^CHAR] is to CHAR.

3
On

You have to set the the value of $Parse::RecDescent::skip. By default, Parse::RecDescent skips all white space. If you set this variable to a regex matching whitespace and comments, you can skip them. Use this:

$Parse::RecDescent::skip = 
    qr{
        (
            \s+                 #whitespace
                |               #or
            /[*] .*? [*]/ \s*   #a multiline comment
                |               #or
            //.*?$               #a single line comment
        )*                      #zero or more

    }mxs;
# m allows '$' to match a newline, x allows regex comments/whitespace, 
# s allows '.' to match newlines.