I am working on a project (running on a French debian machine) where std::wstring are heavily used. Some of these strings are accentuated.
I am bit puzzled that std::wcout sometimes fails to correctly display the std::string.
Here are some examples:
#include <iostream>
int main() {
const std::string accentuated_string = "str {grandma: mémère}" ;
std::cout << accentuated_string << std::endl; // prints "str {grandma: mémère}""
const std::wstring accentuated_wstring = L"wstr {grandma: mémère}" ;
std::wcout << accentuated_wstring << std::endl; // prints "wstr {grandma: m�m�re}"
}
if removing the initial std::string, the � are replaced by ?
#include <iostream>
int main() {
const std::wstring accentuated_wstring = L"wstr {grandma: mémère}" ;
std::wcout << accentuated_wstring << std::endl; // prints "wstr {grandma: m?m?re}"
}
Now, if I add some setlocale and still hide the std::string, I get
#include <iostream>
int main() {
constexpr auto encoding = "fr_FR.UTF-8";
for (const auto lc_type : {LC_ALL, LC_NUMERIC, LC_TIME, LC_COLLATE, LC_MONETARY, LC_MESSAGES, LC_NAME, LC_ADDRESS, LC_TELEPHONE, LC_MEASUREMENT, LC_IDENTIFICATION}) {
setlocale( LC_ALL, encoding );
}
//const std::string accentuated_string = "str {grandma: mémère}" ;
//std::cout << accentuated_string << std::endl; // always print "grandma: mémère"
const std::wstring accentuated_wstring = L"wstr {grandma: mémère}" ;
std::wcout << accentuated_wstring << std::endl; // prints "grandma: mémère" !!!
}
but when un-commenting the std::string, it still fails:
#include <iostream>
int main() {
constexpr auto encoding = "fr_FR.UTF-8";
for (const auto lc_type : {LC_ALL, LC_NUMERIC, LC_TIME, LC_COLLATE, LC_MONETARY, LC_MESSAGES, LC_NAME, LC_ADDRESS, LC_TELEPHONE, LC_MEASUREMENT, LC_IDENTIFICATION}) {
setlocale( LC_ALL, encoding );
}
const std::string accentuated_string = "str {grandma: mémère}" ;
std::cout << accentuated_string << std::endl; // always print "grandma: mémère"
const std::wstring accentuated_wstring = L"wstr {grandma: mémère}" ;
std::wcout << accentuated_wstring << std::endl; // wstr {grandma: m�m�re}
}
I have read that std::wstring should be avoided on linux.
However, I am curious to have an explaination of these behaviours:
- why wcout prints these weird characters?
- why calling setlocale might help?
- why displaying first a
std::stringleads to an error when displaying astd::wstringafterwards?
Note: my system is parameterized in French UTF-8:
(base) ➜ ~ locale
LANG=fr_FR.UTF-8
LANGUAGE=
LC_CTYPE="fr_FR.UTF-8"
LC_NUMERIC="fr_FR.UTF-8"
LC_TIME="fr_FR.UTF-8"
LC_COLLATE="fr_FR.UTF-8"
LC_MONETARY="fr_FR.UTF-8"
LC_MESSAGES="fr_FR.UTF-8"
LC_PAPER="fr_FR.UTF-8"
LC_NAME="fr_FR.UTF-8"
LC_ADDRESS="fr_FR.UTF-8"
LC_TELEPHONE="fr_FR.UTF-8"
LC_MEASUREMENT="fr_FR.UTF-8"
LC_IDENTIFICATION="fr_FR.UTF-8"
LC_ALL=
(base) ➜ ~ echo $LANG
fr_FR.UTF-8
(base) ➜ ~ g++ --version
g++ (Debian 10.2.1-6) 10.2.1 20210110
(base) ➜ ~ file -bi main.cpp
text/x-c; charset=utf-8