POSIX, aka "The Open Group Base Specifications Issue 7, 2018 edition", has this to say about regular expression operator precedence:
9.4.8 ERE Precedence
The order of precedence shall be as shown in the following table:
ERE Precedence (from high to low) Collation-related bracket symbols [==] [::] [..]Escaped characters \special-characterBracket expression []Grouping ()Single-character-ERE duplication * + ? {m,n}Concatenation ab Anchoring ^ $Alternation |
I am curious as to the reason for the first two levels being in that order. Being a unix user from way back, I am accustomed to being able to "throw a backslash in front of it" to escape virtually anything. But it appears that with Collation-Related-Bracket-Symbols (CRBS), I can't do that. If I want to match a literal [.ch.] I can't just type \[.ch.] and rely on "dot matches dot" to handle things for me. I now have to match something like [[].ch.] (or possibly worse?).
I'm trying, and failing, to imagine what the scenario was when whoever-thought-this-up decided this should be the order. Is there a concrete scenario where having CRBS ranked higher than backslash makes sense, or was this a case of "we don't understand CRBS yet so let's make it higher priority" or ... what, exactly?
At least for Gnu grep, it looks like lib/dfa.c treats the CRBS as one lexical token, as per the function
parse_bracket_exp().For the example given, escaping the special characters (square brackets and dots) seems to give the results you are looking for. You can also match literal dots with
[.]which might be easier to see in a regular expression.