How to find UTF-8 reference of a composite unicode character

1.4k Views Asked by Loïc N. At 28 July 2025 at 21:48

At work, i have this issue where i need to find the UTF-8 reference of a composite unicode character.

The character in question is a "n" with a "^" on top : n̂. This is represented in unicode by the character "n" (U+006E) followed by the circumflex accent (U+0302).

What i'm looking to find is the single reference of this character in UTF-8.

I've been looking all around, but i can't seem to find an answer to this. I feel stupid because it doesn't seem that finding such a simple thing would be hard.

Edit : So i thought that the composition of "n" and "^" could be mapped to a single UTF-8 code point (i hope i'm using the terminology right). However, you explained me that it was otherwise. Thank you all for the help.

Loïc.

Original Q&A

There are 2 best solutions below

Remy Lebeau On 09 June 2015 at 19:32 BEST ANSWER

UTF-8 is a byte encoding for a sequence of individual Unicode codepoints. There is no single Unicode codepoint defined for n̂, not even when a Unicode string is normalized in NFC or NFKC formats. As you have noted, n̂ consists of codepoint U+006E LATIN SMALL LETTER N followed by codepoint U+0302 COMBINING CIRCUMFLEX ACCENT. In UTF-8, U+006E is encoded as byte 0x6E, and U+0302 is encoded as bytes 0xCC 0x82.

Joe On 09 June 2015 at 13:33

If you want the string as composed as possible, then you want it in NFC (Normalized Form Composed, see Unicode equivalence). You can do this in Python using this example:

#!/usr/bin/python3

import unicodedata

for s in ['Jalapen\u0303o', 'n̂']:
  print(s)
  print(ascii(s))
  print('NFC:', ascii(unicodedata.normalize('NFC', s))) 
  print('NFD:', ascii(unicodedata.normalize('NFD', s)))
  print('')

This will give you:

Jalapeño

'Jalapen\u0303o'

NFC: 'Jalape\xf1o'

NFD: 'Jalapen\u0303o'

n̂

'n\u0302'

NFC: 'n\u0302'

NFD: 'n\u0302'

As you can see, while the 'ñ' has both a composed and decomposed form, the 'n̂' does not. Its only form is decomposed, as two separate characters.

How to find UTF-8 reference of a composite unicode character

There are 2 best solutions below

Related Questions in UNICODE

Related Questions in ENCODING

Related Questions in UTF-8

Related Questions in CHARACTER-ENCODING

Trending Questions

Popular # Hahtags

Popular Questions