Is symbol ’ special one for boost regexp?

147 Views Asked by At

Regular expression: “[^”]*“

String: “lips“

Result: match

String: “lips’“

Result: not match

I expect both strings to match.

C++ code:

#include <iostream>
#include <string>
#include <boost/regex.hpp>

using namespace std;
using namespace boost;

int main()
{
    const string s1 = "“lips“";
    const string s2 = "“lips’“";
    if (regex_search(s1, regex("“[^”]*“"))) cout << "s1 matched" << endl;
    if (regex_search(s2, regex("“[^”]*“"))) cout << "s2 matched" << endl;
    return 0;
}

output: s1 matched

Is the symbol special ? Why is the second string not matching?

1

There are 1 best solutions below

0
Alex On BEST ANSWER

boost regex library does not use utf-8 by default. utf-8 quote symbol and apostrophe have common byte, that`s why regex does not work. Code for utf-8:

#include <iostream>
#include <string>
#include <boost/regex.hpp>
#include <boost/regex/icu.hpp>

using namespace std;
using namespace boost;

int main()
{
    const string s1 = "“lips“";
    const string s2 = "“lips’“";
    if (u32regex_search(s1, make_u32regex("“[^”]*“"))) cout << "s1 matched" << endl;
    if (u32regex_search(s2, make_u32regex("“[^”]*“"))) cout << "s2 matched" << endl;
    return 0;
}

compilation: g++ -std=c++11 ./test.cc -licuuc -lboost_regex

output:

s1 matched
s2 matched