I came across two code snippets
std::wstring str = std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>>().from_bytes("some utf8 string");
and,
std::wstring str = std::wstring_convert<std::codecvt_utf8<wchar_t>>().from_bytes("some utf8 string");
Are they both correct ways to convert utf-8 stored in std::string
to utf-16 in std::wstring
?
codecvt_utf8_utf16
does exactly what it says: converts between UTF-8 and UTF-16, both of which are well-understood and portable encodings.codecvt_utf8
converts between UTF-8 and UCS-2/4 (depending on the size of the given type). UCS-2 and UTF-16 are not the same thing.So if your goal is to store genuine, actual UTF-16 in a
wchar_t
, then you should usecodecvt_utf8_utf16
. However, if you're trying to do cross-platform coding withwchar_t
as some kind of Unicode-ish thing or whatever, you can't. The UTF-16 facet always converts to UTF-16, whereaswchar_t
on non-Windows platforms is expected to generally be UTF-32/UCS-4. By contrast,codecvt_utf8
only converts to UCS-2/4, but on Windows,wchar_t
strings are "supposed" to be full UTF-16.So you can't write code to satisfy all platforms without some
#ifdef
or template work. On Windows, you should usecodecvt_utf8_utf16
; on non-Windows, you should usecodecvt_utf8
.Or better yet, just use UTF-8 internally and find APIs that directly take strings in a specific format rather than platform-dependent
wchar_t
stuff.