Background:
I am making a hash that will allow you to lookup the description you see below by feeding it a QString containing its character.
I got a full list of the relevant data, looking something like this:
QHash<QString, QString> lookupCharacterDescription;
...
lookupCharacterDescription.insert("003F","QUESTION MARK");
lookupCharacterDescription.insert("0040","COMMERCIAL AT");
lookupCharacterDescription.insert("0041","LATIN CAPITAL LETTER A");
lookupCharacterDescription.insert("0042","LATIN CAPITAL LETTER B");
...
lookupCharacterDescription.insert("1F648","SEE-NO-EVIL MONKEY");
lookupCharacterDescription.insert("1F649","HEAR-NO-EVIL MONKEY");
lookupCharacterDescription.insert("1F64A","SPEAK-NO-EVIL MONKEY");
lookupCharacterDescription.insert("1F64B","HAPPY PERSON RAISING ONE HAND");
...
lookupCharacterDescription.insert("FFFD","REPLACEMENT CHARACTER");
lookupCharacterDescription.insert("FFFE","<not a character>");
lookupCharacterDescription.insert("FFFF","<not a character>");
lookupCharacterDescription.insert("FFFFE","<not a character>");
lookupCharacterDescription.insert("FFFFF","<not a character>");
Now obviously "1F64B"
needs to be wrapped in something here. I have tried playing around with things like 0x1F64B
as a QChar, but I am honestly groping in the dark here. I could make it work with the lower values like the Latin Letters, but it fails with the 5 character addresses.
Questions:
- How do I classify
1F64B
? - Is this considered UTF-32?
- What can I wrap this value "1F64B" in to produce the QString("")?
- Will the wrappings also work for the lower values?
When you use
QString(0x1F64B)
it'll callQString::QString(QChar ch)
. SinceQChar
is a 16-bit type, it'll truncate the value to 0xF64B and you get an invalid character since that code point is currently unassigned. I'm pretty sure you'll get an out-of-range warning at that line. You can see the valueF64B
easily in the character
if you zoom in or use a hex editor. Since there's no way for 0x1F64B to fit into a single 16-bit QChar and must be represented by a surrogate pair, you can't initialize the string that way.OTOH
QString("")
works since it's constructing the string from another string. You must construct the string with a string like that, or manually by assigning the UTF-8/16 code units.No. UTF-32 is a Unicode encoding that uses 32 bits for a code unit. You only have QString and not a bare byte array, so you don't need to care about its underlying encoding (which is actually UTF-16)
You shouldn't deal with the numeric values as string. Store it as a numeric type instead
and then to make a string that contains the character at code point 0x1F64B use
Yes, since UCS4, A.K.A. UTF-32, can store any possible Unicode characters
Alternatively you can construct the character from UTF-16 or UTF-8. U+1F64B is encoded in UTF-16 as
D83D DE4B
, or asF0 9F 99 8B
in UTF-8, therefore you can use any of the belowIf you want to include the string in its literal form in source code then either of the following will work
If you have C++11 support then simply use the prefix
u8
,u
andU
for UTF-8, UTF-16 and UTF-32 respectively likeMandatory article to understand text and encodings: There Ain't No Such Thing as Plain Text