Windows NT uses the Unicode (two bytes wide UTF-16) as the default encoding method throughout the Windows NT API. If you choose to use ASCII or multibyte character set as your default character set, they will transform ASCII to Unicode. And use ASCII character set will be slower than Unicode.
What does this transform mean? They only transform ASCII API to Unicode API or transform all the strings?
For example:
If you create a C/C++ file with const char* text = "Hello, world!"
. When you compile it on Windows NT, does the compiled binary file store "Hello, world!" as Unicode (26 bytes) or ASCII (13 bytes)?
Windows NT binary executable file internal const string encoding
281 Views Asked by Bill Banks AtThere are 2 best solutions below

The compiler doesn't change the type of your strings. It will encode them as you declare them.
Windows NT and its subsequent versions (2000, XP, 2003, Vista, 7, 8, 8.1, 10) internally use 2-byte characters (it calls them "wide characters"). Windows NT used to use the UCS-2
encoding; since Windows 2000 it switched to UTF-16LE
.
For most of its API functions that handle strings they have 2 different versions; the name of the one that handles ANSI strings ends in A
, the name of the other one ends in W
("W" from "wide chars"). A set of macro definitions maps the names without suffixes to either the A
or the W
versions. The selection is driven by the presence of a macro named _UNICODE
. The programmer is, however, free to call the A
or the W
function directly, if the situation calls for it.
In order to help the developers handle wide character strings, the standard C library provided by Microsoft contains a set of functions for handling wide-char strings (the equivalents of strlen()
, strcat()
a.s.o). Their names usually have str
replaced with wcs
.
The programmer is the one that decides which version of each function to use. Most of the times there is no need to convert the encoding (as long as you stick to one of the above). However, there are subsystems where there are no options: you have to convert the strings to Unicode to make them work.
You can read more about how Windows handles the strings in the API: https://msdn.microsoft.com/en-us/library/windows/desktop/ff381407%28v=vs.85%29.aspx
To answer your question, Windows doesn't change your strings. It only converts internally from ANSI to Unicode the strings to pass to the A
versions of its API functions. It also converts from Unicode back to ANSI (if possible), the strings that are returned by the A
versions of the API functions (GetWindowTextA()
, for example).
You have to decide which API version you use: ANSI or Unicode. Either you use the functions explicitly (like CreateFileA for ANSI, resp. CreateFileW for Unicode) or you use the function name without 'A' or 'W' and the _UNICODE preprocessor variable decides which of the two functions is used. Certain functions require structs that contain strings. Then there are two versions of these structs as well (like OSVERSIONINFOA and OSVERSIONINFOW). There is no good reason to ANSI nowadays.
But this only applies to arguments, not content. If you write a string to a file using a pointer to the data and its size, no translation is taking place.
To answer your question: since you explicitly used
char
it takes up 13 bytes. If you'd usedwchar
it would use 26 bytes. You could have writtenconst TCHAR* text = _T("Hello world!");
and then _UNICODE would decide.