How to display std::wstring with accentuated characters

55 Views Asked by At

I am working on a project (running on a French debian machine) where std::wstring are heavily used. Some of these strings are accentuated.

I am bit puzzled that std::wcout sometimes fails to correctly display the std::string.

Here are some examples:

#include <iostream>

int main() {

    const std::string accentuated_string = "str {grandma: mémère}" ;
    std::cout << accentuated_string << std::endl; // prints "str {grandma: mémère}""

    const std::wstring accentuated_wstring = L"wstr {grandma: mémère}" ;
    std::wcout << accentuated_wstring << std::endl; // prints "wstr {grandma: m�m�re}"
}

if removing the initial std::string, the are replaced by ?

#include <iostream>
int main() {
    const std::wstring accentuated_wstring = L"wstr {grandma: mémère}" ;
    std::wcout << accentuated_wstring << std::endl; // prints "wstr {grandma: m?m?re}"
}

Now, if I add some setlocale and still hide the std::string, I get

#include <iostream>
int main() {
    constexpr auto encoding = "fr_FR.UTF-8";
    for (const auto lc_type : {LC_ALL, LC_NUMERIC, LC_TIME, LC_COLLATE, LC_MONETARY, LC_MESSAGES, LC_NAME, LC_ADDRESS, LC_TELEPHONE, LC_MEASUREMENT, LC_IDENTIFICATION}) {
        setlocale( LC_ALL, encoding );            
    }

    //const std::string accentuated_string = "str {grandma: mémère}" ;
    //std::cout << accentuated_string << std::endl; // always print "grandma: mémère"

    const std::wstring accentuated_wstring = L"wstr {grandma: mémère}" ;
    std::wcout << accentuated_wstring << std::endl; // prints "grandma: mémère" !!!
}

but when un-commenting the std::string, it still fails:

#include <iostream>

int main() {

    constexpr auto encoding = "fr_FR.UTF-8";
    for (const auto lc_type : {LC_ALL, LC_NUMERIC, LC_TIME, LC_COLLATE, LC_MONETARY, LC_MESSAGES, LC_NAME, LC_ADDRESS, LC_TELEPHONE, LC_MEASUREMENT, LC_IDENTIFICATION}) {
        setlocale( LC_ALL, encoding );            
    }

    const std::string accentuated_string = "str {grandma: mémère}" ;
    std::cout << accentuated_string << std::endl; // always print "grandma: mémère"

    const std::wstring accentuated_wstring = L"wstr {grandma: mémère}" ;
    std::wcout << accentuated_wstring << std::endl; // wstr {grandma: m�m�re}
}

I have read that std::wstring should be avoided on linux.

However, I am curious to have an explaination of these behaviours:

  • why wcout prints these weird characters?
  • why calling setlocale might help?
  • why displaying first a std::string leads to an error when displaying a std::wstring afterwards?

Note: my system is parameterized in French UTF-8:

(base) ➜  ~ locale
LANG=fr_FR.UTF-8
LANGUAGE=
LC_CTYPE="fr_FR.UTF-8"
LC_NUMERIC="fr_FR.UTF-8"
LC_TIME="fr_FR.UTF-8"
LC_COLLATE="fr_FR.UTF-8"
LC_MONETARY="fr_FR.UTF-8"
LC_MESSAGES="fr_FR.UTF-8"
LC_PAPER="fr_FR.UTF-8"
LC_NAME="fr_FR.UTF-8"
LC_ADDRESS="fr_FR.UTF-8"
LC_TELEPHONE="fr_FR.UTF-8"
LC_MEASUREMENT="fr_FR.UTF-8"
LC_IDENTIFICATION="fr_FR.UTF-8"
LC_ALL=

(base) ➜  ~ echo $LANG
fr_FR.UTF-8

(base) ➜  ~ g++ --version
g++ (Debian 10.2.1-6) 10.2.1 20210110

(base) ➜  ~ file -bi main.cpp              
text/x-c; charset=utf-8
0

There are 0 best solutions below