I implement a search of a substring in strings and i would like to make this search "accent-nutral" or it might be called rough - if i start search "aba" in "rábano" i am supposed to succeed.
in Find substring in string using locale there is a working answer:
#include <locale>
#include <string>
#include <boost/locale.hpp>
std::string NormalizeString(const std::string & input)
{
std::locale loc = boost::locale::generator()("");
const boost::locale::collator<char>& collator = std::use_facet<boost::locale::collator<char> >(loc);
std::string result = collator.transform(boost::locale::collator_base::primary, input);
return result;
}
The only issue with this solution - transform adds several bytes to the end of string. in my case it is "\x1\x1\x1\x1\x0\x0\x0". Four bytes with 1 and several zero-bytes. Of course it is easy to erase these bytes but i would not like to rely on such subtle implementation details. (The code is supposed to be cross-platform)
Is there a more reliable way?
As @R. Martinho Fernandes said it looks impossible to implement such a search with boost. I found the solution in chrome sources. it uses ICU.
usage: