I want to improve my knowledge about Golang by reading the Golang specification but English isn't my native language; and, I do not fully understand what the following text means:
Source code is Unicode text encoded in UTF-8. The text is not canonicalized, so a single accented code point is distinct from the same character constructed from combining an accent and a letter; those are treated as two code points. For simplicity, this document will use the unqualified term character to refer to a Unicode code point in the source text.
With reference to the above text, what do the following parts mean?
- The text is not canonicalized
- Single accented code
- Unqualified term character to refer to a Unicode code point in the source text
If questions of this type are not suitable for this site, please advise a more suitable place to ask such questions.
It's important that you understand a particular facet of the Unicode standard first. There are essentially two ways to represent a accented character like
ë. One is the single code pointU+00EB(Latin Small Letter E with Diaeresis), and the second is two code points̈ewhich is the simple code pointU+0065(Latin Small Letter E, a regular lettere) with another code pointU+0308(Combining Diaeresis).Now in effect, these two characters are the same. They are merely constructed differently. This leads to a concept called Unicode equivalence which normalizes (or canonicalizes) those two sets of code points to be equivalent.
This means that the two accented letters
ëand̈eabove are not equivalent in the language spec. The first one is the "single accented code"U+00EB, and the latter is the letterecombined with a combining diacritic.It's just saying "We're defining for this document only the term 'character' to mean a single Unicode code point." This is for ease of reading, not to define anything in the language specification, and therefore it is "unqualified."