From Core Java, vol. 1, 9th ed., p. 69:
The character ℤ requires two code units in the UTF-16 encoding. Calling
String sentence = "ℤ is the set of integers"; // for clarity; not in book char ch = sentence.charAt(1)doesn't return a space but the second code unit of ℤ.
But it seems that sentence.charAt(1) does return a space. For example, the if statement in the following code evaluates to true.
String sentence = "ℤ is the set of integers";
if (sentence.charAt(1) == ' ')
System.out.println("sentence.charAt(1) returns a space");
Why?
I'm using JDK SE 1.7.0_09 on Ubuntu 12.10, if it's relevant.
It sounds like tho book is saying that 'ℤ' is not a UTF-16 character in the basic multilingual plane, but in fact it is.
Java uses UTF-16 with surrogate pairs for characters that are not in the basic multilingual plane. Since 'ℤ' (0x2124) is in the basic multilingual plane it is represented by a single code unit. In your example
sentence.charAt(0)will return 'ℤ', andsentence.charAt(1)will return ' '.A character represented by surrogate pairs has two code units making up the character.
sentence.charAt(0)would return the first code unit, andsentence.charAt(1)would return the second code unit.See http://docs.oracle.com/javase/6/docs/api/java/lang/String.html: