Java String intern function and ==

152 Views Asked by At

Recently I'm learning Hotspot JVM. When learning the string constant pool and String intern function, I encountered a very weird situation. After browsing a lot of answers, I still can’t explain this phenomenon, so I’m sending it out to discuss with you.

    public static void main(String[] args) {
        String s1 = new String("12") + new String("21");
        s1.intern();
        String s2 = "1221";
        System.out.println(s1 == s2); // true
    }

    public static void main(String[] args) {
        String s1 = new String("12") + new String("21");
        // s1.intern();
        String s2 = "1221";
        System.out.println(s1 == s2); // false
    }

The reslut is based on Java8.

So the only difference between the two codes is call s1.intern() or not.

Here is the document of intern function.

When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the equals(Object) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned.

Here is my understanding:

  1. By browsing the bytecode file, we can find "12", "21", "1221" in the constant pool.
  2. When the class is loaded, the constant pool in bytecode file is loaded into run-time constant pool. So the String pool contains "12", "21", "1221".
  3. new String("12") create a String instance on the heap, which is different from "12" in String pool. So does new String("21").
  4. The "+" operator is transformed into StringBuilder and call its append and toString method, which can be seen in bytecode.
  5. In toString method calls new string, so s1 is String instance "1221" on the heap.
  6. s1.intern() look into String pool, and a "1221" is there, so it dose nothing. Btw, we don't use the return value, so it has nothing to do with s1.
  7. String s2 = "1221" just loaded the "1221" instance in the string pool. In bytecode, ldc #11, #11 is the index of "1221" in constant pool.
  8. The "==" operator comapre the address of reference type. The s1 point to the instance on the heap, the s2 point to the instance in the string pool. How can these two be equal?

My wonder:

  1. What exactly do s1 and s2 point to?
  2. Why call intern() methed will change the behavior? Even don't use the return value.

Here is my assumption:

  1. The string pool is not initilized when class is loaded. Some answer said s1.intern() is the first time "1221" is loaded into string pool. But how to explain "1221" is in the constant pool of bytecode file. Is there any specification about string pool loading timing?

  2. Another saying is intern function just save the reference to the instance on the heap, but the renference s1, s2 are still different. s1 point the heap, s2 point to the string pool, and string pool point to the heap. The reference is different from reference of a reference.

3

There are 3 best solutions below

0
DracoYu On

I am the questioner.

Thanks for the discussion with @Sweeper and @user16320675, I have new understanding of this problem, and I share it with you here.

The error occurred in understanding 2 and 6, the string pool was not loaded along with the class loading. s1.intern() is the first time adds "1221" to the string pool. And then String s2 = "1221" will change the behavior according to whether "1221" exists in the string pool.

In order to better explain this problem, first define the key concepts involved.

key concept

  • Constant pool: A Data structure in bytecode, used to store constants, strings, classes, fields, methods, interfaces, parameter types, etc. used in source code. Stored in a bytecode file on the hard disk.
  • Runtime constant pool: When the program is running, the constant pool in memory. When the class is loaded, the constant pool data will be loaded into the JVM method area to form a runtime constant pool.
  • CONSTANT_String_info: A data structure in the constant pool, which stores the Unicode sequence corresponding to the string literal in the source code
  • String pool: A memory area in the JDK8 heap for accessing used String instances.
  • ldc #5: push the No.5 constant from the runtime constant pool to the operand stack. When using a string represented by a literal, it will first check whether there is a corresponding string instance in the string pool. If it exists, its reference address is pushed into the stack; if it does not exist, a string instance is created in the string pool and its address is pushed into the stack.

wrong reason

The error comes from misunderstanding the relationship between the string pool and the constant pool (hereafter using the constant pool and the runtime constant pool indiscriminately).

Although it is usually called string constant pool, it does not have a relationship with the constant pool. Therefore, it will not be loaded as the class is loaded. In JDK6, both the string pool and the constant pool are located in the permanent generation, and there seems to be some relationship between them. But in JDK8, the string pool was moved to the heap. It is not so much part of the constant pool as it is part of the String class. It can be understood as a private member variable of the String class, although it cannot be observed in the String source code.

After the String instance in the string pool is created, the byte array in the instance cannot be changed. If a change operation is performed on an existing String instance, a new String instance will be generated, showing the characteristics of a constant, so it is usually called a string constant pool. But in order to avoid confusing the string pool and the constant pool, I tries to use the string pool instead of the string constant pool.

Another concept that is easily confused with it is CONSTANT_String_info in the constant pool. String literals are stored in Unicode sequences, and will be loaded into the runtime constant pool along with class loading. But it is fundamentally different from the string pool: CONSTANT_String_info only stores Unicode sequences, while the string pool stores String instances. String instances not only contain Unicode sequences, but also other member attributes, such as hash. And the String class is bound with many methods which cannot be executed on CONSTANT_String_info. The corresponding String instance can be generated by executing the String initialization function with the Unicode sequence in CONSTANT_String_info as a parameter.

0
Michael Gantman On

Here is a short explanation. First operator == will be only true if two compared strings are actually the same instance of a String class. For 2 different instances of a String class that hold the same content the result would be false. So if you really want to compare the content of 2 Strings you MUST use methods equals() of a String class. Now if you write the following code:

String s1 = "test";
//s1.intern();
String s2 = "test";
System.out.println(s1==s2) // output most likely will be true;

Even if you don't invoke s1.intern() it will most likely (although not guaranteed) will be invoked behind the scenes by JVM and s2 will be assigned the same instance, and that is why the s1==s2 will be true. (If you invoke s1.intern() than the true result is guaranteed). Now if you run the following code:

String s1 = "test";
s1.intern();
String s2 = new String("test");
System.out.println(s1==s2) // output will be false;

Because with new String("test") you forse creation of a new instance of a String regardless of what is already in existence in the internal pool

0
Oleg Cherednik On
String one = new String("abc");
String two = new String("abc");

boolean res1 = one == two;      // false -> two different objects
boolean res2 = one.equals(two); // true -> content identical

one = one.intern(); // i.e. put string (if not exist) to the StringPool
// and retrieve the object from the StringPool back 
two = two.intern();
boolean res3 = one == two;      // true -> same object from the StringPool
boolean res4 = one.equals(two); // true -> content identical

// Put string literal "12" into StringPool
// Create and object in heap with "12"
String one = new String("12");
String two = new String("21");

// Concatenate two strings
// Put result into StringPool and retrieve it back 
String two = one + two;


// Concatenate two strings
// Put result into StringPool and retrieve it back
// Create an object in heap with result string
String three = new String(one + two);


// Put string literal to the StringPool and retrieve it back
String four = "1221";

boolean res1 = two == four;  // true -> both objects are from StringPool
boolean res2 = three == four; // false -> `three` is in Heap,
// `four` is in StringPool


// Put string into StringPool and retrieve it back
three = three.intern(); 

boolean res3 = three == four; // true -> both objects are from StringPool