How to understand this spec text?

Question

How to understand this spec text?

124 Views Asked by defaultprogr At 07 June 2025 at 10:14

I want to improve my knowledge about Golang by reading the Golang specification but English isn't my native language; and, I do not fully understand what the following text means:

Source code is Unicode text encoded in UTF-8. The text is not canonicalized, so a single accented code point is distinct from the same character constructed from combining an accent and a letter; those are treated as two code points. For simplicity, this document will use the unqualified term character to refer to a Unicode code point in the source text.

With reference to the above text, what do the following parts mean?

The text is not canonicalized
Single accented code
Unqualified term character to refer to a Unicode code point in the source text

If questions of this type are not suitable for this site, please advise a more suitable place to ask such questions.

Original Q&A

There are 1 best solutions below

**Adam Smith** · Accepted Answer

It's important that you understand a particular facet of the Unicode standard first. There are essentially two ways to represent a accented character like ë. One is the single code point U+00EB (Latin Small Letter E with Diaeresis), and the second is two code points ̈e which is the simple code point U+0065 (Latin Small Letter E, a regular letter e) with another code point U+0308 (Combining Diaeresis).

Now in effect, these two characters are the same. They are merely constructed differently. This leads to a concept called Unicode equivalence which normalizes (or canonicalizes) those two sets of code points to be equivalent.

The text is not canonicalized, so a single accented code point is distinct from the same character constructed from combining an accent and a letter

This means that the two accented letters ë and ̈e above are not equivalent in the language spec. The first one is the "single accented code" U+00EB, and the latter is the letter e combined with a combining diacritic.

For simplicity, this document will use the unqualified term character to refer to a Unicode code point in the source text

It's just saying "We're defining for this document only the term 'character' to mean a single Unicode code point." This is for ease of reading, not to define anything in the language specification, and therefore it is "unqualified."

How to understand this spec text?

There are 1 best solutions below

Related Questions in GO

Related Questions in LANGUAGE-SPECIFICATIONS

Trending Questions

Popular # Hahtags

Popular Questions