Writing compilation mode regexps

686 Views Asked by At

The final working solution that takes into account line and column ranges:

(csharp
 "^ *\\(?:[0-9]+>\\)*\\(\\(?:[a-zA-Z]:\\)?[^:(\t\n]+\\)(\\([0-9]+\\),\\([0-9]+\\),\\([0-9]+\\),\\([0-9]+\\)) *\: \\(error\\|warning\\) *CS[0-9]+:)"
 1 (2 . 4) (3 . 5) )

Both answers below were incredibly helpful; I understand the system a lot better now.


Summary: my regexps work to match the output strings, but don't work in the compilation-error-regexp-alist-alist to match errors in my compilation output.

I'm finding the compilation mode regexps a bit confusing. I've written a regex that I know works on my error string using rebuilder and the original regexes that are in compile.el.

40>f:\Projects\dev\source\Helper.cs(37,22,37,45): error CS1061: 'foo.bar' does not contain a definition for 'function' and no extension method 'method' accepting a first argument of type 'foo.bar' could be found (are you missing a using directive or an assembly reference?)

And here's my regexp:

(pushnew '(csharp
 "^ *\\(?:[0-9]+>\\)*\\(\\(?:[a-zA-Z]:\\)?[^:(\t\n]+\\)(\\([0-9]+\\),\\([0-9]+\\),[0-9]+,[0-9]+) *\: \\(?:error *CS[0-9]+:\\)"
 2 3)
     compilation-error-regexp-alist-alist)

Obviously, I'm just trying to get to the first line/column pair that's output. (I'm surprised that the compiler is outputting 4 numbers instead of two, but whatever.)

If we look at the edg-1 regexp in compile.el:

    (edg-1
 "^\\([^ \n]+\\)(\\([0-9]+\\)): \\(?:error\\|warnin\\(g\\)\\|remar\\(k\\)\\)"
 1 2 nil (3 . 4))

So I guess where I'm confused is to how the arguments are passed. In edg-1, where are 3 and 4 coming from? I guess they don't correspond to the capture groups? If I run the edg-1 regexp through re-builder on a well-formed error message and enter subexpression mode, 0 matches the whole matching string, 1 matches the file name and path, and 2 matches the line number. From looking at the documentation (when I do M-x describe-variable), it appears as though it just cares about what place the subexpressions are in the main expression. Either way, I'm clearly misunderstanding something.

I've also tried modifying the official csharp.el regexp to handle the extra two numbers, but with no luck.

(Edit, fixed the example slightly, updated the csharp regexp)

2

There are 2 best solutions below

6
On BEST ANSWER

Found some info on this.

This page has a simplified explanation:
http://praveen.kumar.in/2011/03/09/making-gnu-emacs-detect-custom-error-messages-a-maven-example/

Quote from page -

"Each elt has the form (REGEXP FILE [LINE COLUMN TYPE HYPERLINK
HIGHLIGHT...]).  If REGEXP matches, the FILE'th subexpression
gives the file name, and the LINE'th subexpression gives the line
number.  The COLUMN'th subexpression gives the column number on
that line"

So it looks like the format is something like this:

(REGEXP FILE [LINE COLUMN TYPE HYPERLINK HIGHLIGHT...])

Looking at the regex again, it looks like a modified BRE.

 ^                   # BOS
 \( [^ \n]+ \)       # Group 1

 (                   # Literal '('
 \( [0-9]+ \)        # Group 2
 )                   # Literal ')'

 : [ ] 

 \(?:
      error
   \|
      warnin\(g\)    # Group 3
   \|
      remar\(k\)     # Group 4
 \)

Here is the edg-1

(edg-1
 "^\\([^ \n]+\\)(\\([0-9]+\\)): \\(?:error\\|warnin\\(g\\)\\|remar\\(k\\)\\)"
 1 2 nil (3 . 4))

Where

"^\\([^ \n]+\\)(\\([0-9]+\\)): \\(?:error\\|warnin\\(g\\)\\|remar\\(k\\)\\)"
REGEXP ^^^^^^^^

 1     2    nil    (3 . 4)
 ^     ^     ^      ^^^^^
FILE LINE  COLUMN   TYPE

"TYPE is 2 or nil for a real error or 1 for warning or 0 for info.
TYPE can also be of the form (WARNING . INFO).  In that case this
will be equivalent to 1 if the WARNING'th subexpression matched
or else equivalent to 0 if the INFO'th subexpression matched."

So, TYPE is of this form (WARNING . INFO)

In the regex,
if capture group 3 matched (ie. warnin\(g\) ) it is equivalent to a warning.
If capture group 4 matched (ie. remar\(k\) ) it is equivalent to info.  
One of these will match.  

csharp element info

Looking at your csharp element

"^ *\\(?:[0-9]+>\\)?\\(\\(?:[a-zA-Z]:\\)?[^:(\t\n]+\\)(\\([0-9]+\\),\\([0-9]+\\),[0-9]+,[0-9]+) *\: \\(?:error *CS[0-9]+:\\)"
2 3 4

And your regex (below) actually doesn't have capture group 4 in it.
So, your FILE LINE COLUMN of 2 3 4
probably should be 1 2 3

Here is your regex as its engine see's it -

 ^ 
 [ ]* 
 \(?:
      [0-9]+ > 
 \)?
 \(                            # Group 1
      \(?:
            [a-zA-Z] : 
      \)?
      [^:(\t\n]+ 
 \)
 (                             # Literal '('
      \( [0-9]+ \)                # Group 2
      ,
      \( [0-9]+ \)                # Group 3
      ,
      [0-9]+
      ,
      [0-9]+ 
 )                             # Literal ')'
 [ ]* \: [ ] 
 \(?:
      error [ ]* CS [0-9]+ :
 \)
1
On

My crystal ball came up with a weird explanation: compilation-error-regexp-alist-alist is just a collection of matching rules, but it doesn't say which rules to use. So you need to add csharp to compilation-error-regexp-alist if you want to use your new rule.

As for the meaning of (3 . 4), see C-h v compilation-error-regexp-alist.