I got some string data from parameter such as ��
.
These are Unicode's UTF-16 surrogate pairs represented as decimal.
How can I convert them to Unicode code points such as "U+1F62C" with the standard library?
I got some string data from parameter such as ��
.
These are Unicode's UTF-16 surrogate pairs represented as decimal.
How can I convert them to Unicode code points such as "U+1F62C" with the standard library?
Copyright © 2021 Jogjafile Inc.
You can easily to it by hand. The algorythm for passing from a high unicode point to the surrogate pair and back is not that hard. Wikipedia page on UTF16 says:
U+10000 to U+10FFFF
That's just bitwise and, or and shift and can trivially be implemented in C or C++.
As you said you wanted to use the standard library, what you ask for is a conversion from two 16 bits UTF-16 surrogates to one 32 bits unicode code point, so
codecvt
is your friend, provided you can compile in C++11 mode or higher.Here is an example processing your values on a little endian architecture:
Output is as expected: