I'm learning about Unicode basics and I came across this passage:
"The Unicode standard describes how characters are represented by code points. A code point is an integer value, usually denoted in base 16. In the standard, a code point is written using the notation U+12ca to mean the character with value 0x12ca (4810 decimal)."
I have three questions from here.
- what does the ca stand for? in some places i've seen it written as just U+12. what's the difference?
- where did the 0 in 0x12ca come from? what does it mean?
- how does the value 0x12ca become 4810 decimal?
its my first post here and would appreciate any help! have a nice day y'all!!
It stands for the hexadecimal digits
canda.Either that is a mistake, or
U+12is another (IMO sloppy / ambiguous) way of writingU+0012... which is a different Unicode codepoint toU+12ca.That is a different notation. That is hexadecimal (integer) literal notation as used in various programming languages; e.g. C, C++, Java and so on. It represents a number ... not necessarily a Unicode codepoint.
The
0xis just part of the notation. (It "comes from" the respective language specifications ...)The
0xmeans that the remaining are hexadecimal digits (aka base 16), where:aorArepresents10,borBrepresents11,corCrepresents12,dorDrepresents13,eorErepresents14,forFrepresents15,So
0x12cais 1 x 163 + 2 x 162 + 12 x 161 + 10 x 160 ... is 4810.(Do the arithmetic yourself to check. Converting between base 10 and base 16 is simple high-school mathematics.)