Printing em-dash to console window using printf?

1.8k Views Asked by At

A simple problem: I'm writing a chatroom program in C++ (but it's primarily C-style) for a class, and I'm trying to print, “#help — display a list of commands...” to the output window. While I could use two hyphens (--) to achieve roughly the same effect, I'd rather use an em-dash (—). printf(), however, doesn't seem to support printing em-dashes. Instead, the console just prints out the character, ù, in its place, despite the fact that entering em-dashes directly into the prompt works fine.

How do I get this simple Unicode character to show up?

Looking at Windows alt key codes, I find it interesting how alt+0151 is "—" and alt+151 is "ù". Is this related to my problem, or a simple coincidence?

2

There are 2 best solutions below

0
On

the windows is unicode (UTF-16) system. console unicode as well. if you want print unicode text - you need (and this is most effective) use WriteConsoleW

BOOL PrintString(PCWSTR psz)
{
    DWORD n;
    return WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), psz, (ULONG)wcslen(psz), &n, 0);
}
PrintString(L"—");

in this case in your binary file will be wide character (2 bytes 0x2014) and console print it as is.

if ansi (multi-byte) function is called for output console - like WriteConsoleA or WriteFile - console first translate multi-byte string to unicode via MultiByteToWideChar and in place CodePage will be used value returned by GetConsoleOutputCP. and here (translation) can be problem if you use characters > 0x80

first of all compiler can give you warning: The file contains a character that cannot be represented in the current code page (number). Save the file in Unicode format to prevent data loss. (C4819). but even after you save source file in Unicode format, can be next:

wprintf(L"ù"); // no warning
printf("ù"); //warning C4566

because L"ù" saved as wide char string (as is) in binary file - here all ok and no any problems and warning. but "ù" is saved as char string (single byte string). compiler need convert wide string "ù" from source file to multi-byte string in binary (.obj file, from which linker create pe than). and compiler use for this WideCharToMultiByte with CP_ACP (The current system default Windows ANSI code page.)

so what happens if you say call printf("ù"); ?

  1. unicode string "ù" will be converted to multi-byte WideCharToMultiByte(CP_ACP, ) and this will be at compile time. resulting multi-byte string will be saved in binary file
  2. the console it run-time convert your multi-byte string to wide char by MultiByteToWideChar(GetConsoleOutputCP(), ..) and print this string

so you got 2 conversions: unicode -> CP_ACP -> multi-byte -> GetConsoleOutputCP() -> unicode

by default GetConsoleOutputCP() == CP_OEMCP != CP_ACP even if you run program on computer where you compile it. (on another computer with another CP_OEMCP especially)

problem in incompatible conversions - different code pages used. but even if you change console code page to your CP_ACP - convertion anyway can wrong translate some characters.

and about CRT api wprintf - here situation is next:

the wprintf first convert given string from unicode to multi-byte by using it internal current locale (and note that crt locale independent and different from console locale). and then call WriteFile with multi-byte string. console convert back this multi-bytes string to unicode

unicode -> current_crt_locale -> multi-byte -> GetConsoleOutputCP() -> unicode

so for use wprintf we need first set current crt locale to GetConsoleOutputCP()

char sz[16];
sprintf(sz, ".%u", GetConsoleOutputCP());
setlocale(LC_ALL, sz);
wprintf(L"—");

but anyway here i view (on my comp) - on screen instead . so will be -— if call PrintString(L"—"); (which used WriteConsoleW) just after this.

so only reliable way print any unicode characters (supported by windows) - use WriteConsoleW api.

0
On

After going through the comments, I've found eryksun's solution to be the simplest (...and the most comprehensible):

#include <stdio.h>
#include <io.h>
#include <fcntl.h>

int main()
{
    //other stuff
    _setmode(_fileno(stdout), _O_U16TEXT);
    wprintf(L"#help — display a list of commands...");

Portability isn't a concern of mine, and this solves my initial problem—no more ù—my beloved em-dash is on display.

I acknowledge this question is essentially a duplicate of the one linked by sata300.de. Albeit, with printf in the place of cout, and unnecessary ramblings in the place of relevant information.