How to convert a unicode number to a std::wstring?

119 Views Asked by At

Is there an easy way to convert a Unicode number to a std::wstring? e.g. I want to convert U+1E9E (=16785054) to .

1

There are 1 best solutions below

2
Remy Lebeau On BEST ANSWER

Depending on the platform which you are running your code on, the encoding of the std::wstring will need to be either UTF-16 (ie, Windows) or UTF-32 (ie, most other OSes). Converting a codepoint number to either of those formats is very trivial.

On platforms where wchar_t is 32-bit in size, suitable for UTF-32, you can just cast the number as-is to wchar_t and then assign it to your wstring.

On platforms where wchar_t is 16-bit in size, suitable for UTF-16, you will have to use a small bit of math to convert the number to 1 or 2 wchar_ts based on its value, and then assign that result to your wstring.

For example:

std::wstring CodePointToWString(unsigned int codepoint)
{
    std::wstring str;

    if constexpr (sizeof(wchar_t) > 2) {
        // use UTF-32
        str = static_cast<wchar_t>(codepoint);
    }
    else {
        // use UTF-16
        if (codepoint <= 0xFFFF) {
            str = static_cast<wchar_t>(codepoint);
        }
        else {
            codepoint -= 0x10000;
            str.resize(2);
            str[0] = static_cast<wchar_t>(0xD800 + ((codepoint >> 10) & 0x3FF));
            str[1] = static_cast<wchar_t>(0xDC00 + (codepoint & 0x3FF));
        }
    }

    return str;
}

...

std::wstring str = CodePointToWString(0x1E9E);

FYI, U+1E9E is not 16785054, it is 7838. 16785054 would be U+1001E9E instead, which is not a valid codepoint.