Go rune literal for high positioned emojis

1.4k Views Asked by At

How do we use an emoji with a rune literal that is beyond I think code point U+265F?

a1 := '\u2665'

  • this works

a2 := '\u1F3A8'

  • this gives error invalid character literal, more that one character.

Is there a way to represent higher positioned emojis as rune literals?

https://unicode.org/emoji/charts/full-emoji-list.html

1

There are 1 best solutions below

4
On BEST ANSWER

You may use the \U sequence followed by 8 hex digits which is the hexadecimal representation of the Unicode codepoint. This is detailed in Spec: Rune literals:

There are four ways to represent the integer value as a numeric constant: \x followed by exactly two hexadecimal digits; \u followed by exactly four hexadecimal digits; \U followed by exactly eight hexadecimal digits, and a plain backslash \ followed by exactly three octal digits. In each case the value of the literal is the value represented by the digits in the corresponding base.

For example:

a1 := '\u2665'
fmt.Printf("%c\n", a1)

a2 := '\U0001F3A8'
fmt.Printf("%c\n", a2)

Which outputs (try it on the Go Playground):

Note (response to @torek):

I believe the Go authors chose to require exactly 4 and 8 hex digits because this allows to use the exact same form, the exact same rune literals inside interpreted string literals. E.g. if you want a string that contains 2 runes, one having code point 0x0001F3A8 and another rune being 4, it could look like this:

s := "\U0001F3A84"

If the spec would not require exactly 8 hex digits, it would be ambiguous whether the last '4' is part of the code point or is an individual rune of the string, so you would have to break the string to a concatenation like "\U1F3A8" + "4".

Spec: String literals:

Interpreted string literals are character sequences between double quotes, as in "bar". Within the quotes, any character may appear except newline and unescaped double quote. The text between the quotes forms the value of the literal, with backslash escapes interpreted as they are in rune literals (except that \' is illegal and \" is legal), with the same restrictions. The three-digit octal (\nnn) and two-digit hexadecimal (\xnn) escapes represent individual bytes of the resulting string; all other escapes represent the (possibly multi-byte) UTF-8 encoding of individual characters. Thus inside a string literal \377 and \xFF represent a single byte of value 0xFF=255, while ÿ, \u00FF, \U000000FF and \xc3\xbf represent the two bytes 0xc3 0xbf of the UTF-8 encoding of character U+00FF.