At work, i have this issue where i need to find the UTF-8 reference of a composite unicode character.
The character in question is a "n" with a "^" on top : n̂. This is represented in unicode by the character "n" (U+006E) followed by the circumflex accent (U+0302).
What i'm looking to find is the single reference of this character in UTF-8.
I've been looking all around, but i can't seem to find an answer to this. I feel stupid because it doesn't seem that finding such a simple thing would be hard.
Edit : So i thought that the composition of "n" and "^" could be mapped to a single UTF-8 code point (i hope i'm using the terminology right). However, you explained me that it was otherwise. Thank you all for the help.
Loïc.
UTF-8 is a byte encoding for a sequence of individual Unicode codepoints. There is no single Unicode codepoint defined for
n̂
, not even when a Unicode string is normalized in NFC or NFKC formats. As you have noted,n̂
consists of codepointU+006E LATIN SMALL LETTER N
followed by codepointU+0302 COMBINING CIRCUMFLEX ACCENT
. In UTF-8,U+006E
is encoded as byte0x6E
, andU+0302
is encoded as bytes0xCC 0x82
.