Underlying mechanism of String pooling in Java?

2.1k Views Asked by At

I was curious as to why Strings can be created without a call to new String(), as the API mentions it is an Object of class java.lang.String

So how are we able to use String s="hi" rather than String s=new String("hi")?

This post clarified the use of == operator and absence of new and says this is due to String literals being interned or taken from a literal pool by the JVM, hence Strings are immutable.

On seeing a statement such as

String s="hi"

for the first time what really takes place ?

  1. Does the JVM replace it like this String s=new String("hi") , wherein an Object is created and "hi" is added to the String literal pool and so subsequent calls such as String s1="hi" are taken from the pool?

  2. Is this how the underlying mechanism operates? If so, then is

    String s=new String("Test");
    String s1="Test";
    

    the same as

    String s="Test";
    String s1="Test";
    

    in terms of memory utilization and efficiency?

  3. Also, is there any way by which we can access the String Pool to check how many String literals are present in it, how much space is occupied, etc.?

7

There are 7 best solutions below

7
On BEST ANSWER
  1. String s="hi" for the first time what really takes place ?

Does the JVM replace it like this String s=new String("hi") , wherein an Object is created and "hi" is added to the String literal pool and so subsequent calls such as String s1="hi" are taken from the pool ?.

No. What really happens is - the String Literals are resolved during compile time and interned (added to the String constants pool) as soon as the class is loaded / initialized or lazily. Thus, they are made available to the classes within the JVM. Note that, even if you have a String with value "hi" in the Strings constants pool, new String("hi") will create another String on the heap and return its reference.

  1. is
 String s=new String("Test"); 
 String s1="Test"; 

the same as

 String s="Test"; 
 String s1="Test"; 

in terms of memory utilization and efficiency?

No, in the first case 2 "Test" Strings are created. One will be added to the String constants pool (assuming it is not already present there) and another on the heap. The second one can be GCed.In the second case, only one String literal is present in the String constants pool and there are 2 references to it (s and s1).

  1. Also if there any way by which we can access the String Pool as in check how many String literals are present in it, space occupied etc from the program or from any monitoring tool?

I don't think we can see the contents of the String constants pool. We can merely assume and confirm the behavior based on our assumptions.

2
On

String pool as it is pool of string stored in heap for exp:

String s="Test";
String s1="Test";    

both gets stored in heap and refers to a single "Test" thus s1=s, while

String s=new String("Test");

is an object that also get stored in heap but different form s1=s refer here

0
On
  1. A kind of, but not exactly.
    String constants are created and interned during constant pool resolution. This happens upon the first execution of LDC bytecode that loads a string literal. After the first execution the JVM replaces JVM_CONSTANT_UnresolvedString constant pool tag with JVM_CONSTANT_String tag so that the next time LDC will take an existing string instead of creating a new one.

  2. No. The first use of "Test" will create a new string object. Then new String("Test") will create the second object.

  3. Yes, using HotSpot Serviceability Agent. Here is an example.

3
On

The Java compiler has special support for string literals. Suppose it did not, then it would be really cumbersome to create strings in your source code, you'd have to write something like:

// Suppose that we would not have string literals like "hi"
String s = new String(new char[]{ 'h', 'i' });

To answer your questions:

  1. More or less, and if you really want to know the details, you'd have to study the source code of the JVM, which you can find at OpenJDK, but be warned that it's huge and complicated.

  2. No, those two are not equivalent. In the first case you are explicitly creating a new String object:

    String s=new String("Test");
    

    which will contain a copy of the String object represented by the literal "Test". Note that it is never a good idea to write new String("some literal") in Java - strings are immutable, and it is never necessary to make a copy of a string literal.

  3. There's no way I know of to check what's in the string pool.

2
On

That's not tightly related to the subject, but whenever you have doubts as to what will java compiler do, you can use the

javap -c CompiledClassName

to print what is actually going on. (CompiledClassName from the dir where CompiledClassName.class is)

To add to Jesper's answer, there are more mechanisms at work, like when you concatenate a String from literals or final variables, it will still use the intern pool:

String s0 = "te" + "st";
String s1 = "test";
final String s2 = "te";
String s3 = s2 + "st";
System.out.println(s0==s1); //true
System.out.println(s3==s1); //true

But when you concatenate using non-final variables it will not use the pool:

String s0 = "te";
String s1 = s0 + "st";
String s2 = "test";
System.out.println(s1 == s2); //false
0
On

I believe that the underlying mechanism for creating a String is a StringBuilder which assembles the String object at the end. At least I know for sure that if you have a string that you want to change, for example:

String str = "my String";
// and then do
System.out.println(str + "new content");

So what this does is it creates a StrigBuilder from the old object and replaces it with a new one that is constructed from the builder. This is why it is more memory efficient to use StringBuilder instead of a regular string to which you would just append stuff.

There is a way to access the already created pool of String which is by using the String.intern() method. It tells java to use the same memory space for Strings which are the same and gives you a reference to that place in memory. This also allows you to use the == operator to compare strings and is more memory efficient.

0
On

The following is a slight simplification, so don't try to cite exact details from it, but the general principles apply.

Each compiled Java class contains a data blob that indicates how many strings were declared in that class file, how long each one is, and the characters that belong in all of them. When the class is loaded, the class loader will create a String[] of suitable size to hold all of the strings defined in that class; for each string, it will then generate a char[] of suitable size, read the appropriate number of characters from the class file into the char[], create a String encapsulating those characters, and store the reference into the class's String[].

When compiling some class (e.g. Foo), the compiler knows which string literal it encounters first, second, third, fifth, etc. If code says myString = "George"; and George was the sixth string literal, that will appear in code as a "load string literal #6" instruction; the just-at-time compiler, when it is generating code for that instruction, will generate an instruction to fetch the sixth string reference associated with that class.