I am developing an application, in C++, that validates configuration files with regex by using the Google RE2 library. The contents of the configuration files are read into an std::string;
So far, I declared this string that contains the regex expression:
const string EXPR_FAILED_FILE(R"([^\u0020-\u007E\n]|(\b.*(Mensagem|Antes|Loop|Movimentar|\|).*)|\\[0-9]{3,4})");
However, in this implementation below I am having some issues to detect some invalid characters in my test string (strInput)
bool checkStringConsistency(const string& strInput){
RE2 re(EXPR_FAILED_FILE);
bool b_matches = RE2::FullMatch(strInput, re);
return b_matches;
}
When I run the code, I am getting these messages in the stderr:
re2/re2.cc:205: Error parsing '[^\u0020-\u007E\n]|(\b.*(Mensagem|Antes|Loop|Movimentar|\|).*)|\\[0-9]{3,4}': invalid escape sequence: \u
re2/re2.cc:890: Invalid RE2: invalid escape sequence: \u
It seems that the RE2 are not recognizing the \u
sequence to seek a Unicode range of characters. I tested this expression at regexr.com and the invalid characters was detected normally there.
What could be wrong here?
Each regex engine has its own syntax and in RE2 you need to use
[^\x{0020}-\x{007E}\n]
instead of[^\u0020-\u007E\n]
. See the syntax document:\u
is used to match an uppercase character and is marked as NOT SUPPORTED