data analysis of java bytecode

60 Views Asked by At

I am working on a java bytecode analyse project, which writes in c

The current stage is to write a stack simulator to simulate the state of class files being executed in the jvm virtual machine.

After processing the basic block of each method, and creating the control flow graph, I found a problem, which is my guess:

  • Is the stack empty after each basic block executed?
  • Or when each branch is converging, no matter which branch goes to the next basic block, the content on the stack will be the same?

eg:

 public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: (0x0009) ACC_PUBLIC, ACC_STATIC
Code:
  stack=2, locals=3, args_size=1
     0: sipush        2000
     3: istore_1
     4: iload_1
     5: sipush        200
     8: if_icmple     22
    11: getstatic     #7                  // Field java/lang/System.out:Ljava/io/PrintStream;
    14: ldc           #13                 // String demo
    16: invokevirtual #15                 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
    19: goto          26
    22: invokestatic  #21                 // Method age:()I
    25: pop
    26: bipush        100
    28: istore_2
    29: return

i called Method age:()I, which return an int value, after next basic block, the compiler add a pop at 25 to clear oprand stack

1

There are 1 best solutions below

6
dan1st might be happy again On BEST ANSWER

Is the stack empty after each basic block executed?

This is specified in section 2.6.4. Normal Method Invocation Completion of the JVM specification:

The current frame (§2.6) is used in this case to restore the state of the invoker, including its local variables and operand stack, with the program counter of the invoker appropriately incremented to skip past the method invocation instruction.

So, when returning, the stack needs to be cleaned up to make sure anything pushed to it is popped back.

However, the JVM itself doesn't have a concept of blocks, this is only in the Java language (there are also goto instructions in the bytecode). So, the language can generate bytecode leaving elements on the stack when returning and the JVM should clean that up.

However, for other blocks, it needs to make sure that type checking still works. It no execution path, it should be possible that you are e.g. pushing an int and then treating it as an Object as I understood it.

I think the relevant part of the specification should be in 4.10.1.6. Type Checking Methods with Code:

A merged code stream is type safe relative to an incoming type state T if it begins with an instruction I that is type safe relative to T, and I satisfies its exception handlers (see below), and the tail of the stream is type safe given the type state following that execution of I.
NextStackFrame indicates what falls through to the following instruction. For an unconditional branch instruction, it will have the special value afterGoto. ExceptionStackFrame indicates what is passed to exception handlers.

And regarding

Or when each branch is converging, no matter which branch goes to the next basic block, the content on the stack will be the same?

I don't think it needs to be exactly the same but it should not lead to type checking violation. If not, class file verification will probably fail.