Parsing annotations within comments / Negative matching of strings in regexps

36 Views Asked by At

I'm defining the syntax of the Move IR. The test suite for this language includes various annotations to enable testing. I need to treat comments of this form specially:

//! new-transaction
// check: "Keep(ABORTED { code: 123,"

This file is an example arithmetic_operators_u8.mvir.

So far, I've got this working by disallowing ordinary single-line comments.

module MOVE-ANNOTATION-SYNTAX-CONCRETE
    imports INT-SYNTAX
    syntax #Layout ::= r"([\\ \\n\\r\\t])"       // Whitespace                                                          
    syntax Annotation ::= "//!" "new-transaction" [klabel(NewTransaction), symbol]                                      
    syntax Check ::= "//" "check:" "\"Keep(ABORTED { code:" Int ",\"" [klabel(CheckCode), symbol]                                                                                                                                         
endmodule                                                                                                               
                                                                                                                  
module MOVE-ANNOTATION-SYNTAX-ABSTRACT
    imports INT-SYNTAX
    syntax Annotation ::= "#NewTransaction" [klabel(NewTransaction), symbol]                                            
    syntax Check ::= #CheckCode(Int)        [klabel(CheckCode), symbol]                                                 
endmodule                   

I'd like to also be able to use ordinary comments. As a first step, I was able to change the Layout to allow commits only if the begin with a ! using r"(\\/\\/[^!][^\\n\\r]*)"

I'd like to exclude all comments that start with either //! or // check: from comments. What's a good way of implementing this? Where can I find documentation for the regular expression language that K uses?

1

There are 1 best solutions below

0
Dwight Guth On BEST ANSWER

K uses flex for its scanner, and thus for its regular expression language. As a result, you can find documentation on its regular expression language here.

You want a regular expression that expresses that comments can't start with ! or check:, but flex doesn't support negative lookahead or not patterns, so you will have to exhaustively enumerate all the cases of comments that don't start with those sequence of characters. It's a bit tedious, sadly.

For reference, here is a (simplified) regular expression drawn from the syntax of C that represents all pragmas that don't start with STDC. It should give you an idea of how to proceed:

#pragma([:space:]*)([^S].*|S|S[^T].*|ST|ST[^D].*|STD|STD[^C].*|STDC[_a-zA-Z0-9].*)?$