I know that I can escape a basic Unicode character in Ruby with the \uNNNN
escape sequence. For example, for a smiling face U+263A (☺) I can use the string literal "\u2603"
.
How do I escape Unicode characters greater than U+FFFF that fall outside the basic multilingual plane, like a winking face: U+1F609 (😉)?
Using the surrogate pair form like in Java doesn't work; it results in an invalid string that contains the individual surrogate code points:
s = "\uD83D\uDE09" # => "\xED\xA0\xBD\xED\xB8\x89"
s.valid_encoding? # => false
You can use the escape sequence
\u{XXXXXX}
, whereXXXXXX
is between 1 and 6 hex digits:The braces can also contain multiple runs separated by single spaces or tabs to encode multiple characters:
You could also use byte escapes to write a literal that contains the UTF-8 encoding of the character, though that's not very convenient, and doesn't necessarily result in a UTF-8-encoded string, if the file encoding differs: