How do virtual machines like Lua or JVM represent (and work on) larger data types?

478 Views Asked by At

Currently working on a toy virtual machine with its own assembly language modelled after ARM, so I'm working towards getting something like this working:

// adds the r1 and r2 registers, result goes in r0
add r0, r1, r2

Here's my idea of a 64-bit bytecode representation I'm considering using for instructions:

// 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
//  ^ inst    ^ reg      ^------^ op       ^------^ op       ^------^ op

So using the example above, it would be 8-bits for the instruction add, 8-bits for the destination register r0, and 16-bits each for up to three operands (registers, literals, etc).

But then I got to thinking: this works fine for integers but how would an instruction that adds two 32-bit floats work? There isn't enough space in the instruction to store the floats plus the instruction and destination register. I'm developing my VM in C and as far as I know 16 or 24-bit floats aren't a thing.

Does anyone have any insights on how other VMs tackle this issue?

1

There are 1 best solutions below

0
Holger On

When you say, your architecture is “modelled after ARM”, you mean, it’s using three explicitly encoded operands and a fixed instruction size.

You couldn’t be farther away from the JVM’s bytecode format. The JVM uses a variable instruction size and the instructions are not even grouped in a way that would make it easy to recognize the size of an instruction, as discussed in Is there a clever way to determine the length of Java bytecode instructions? A bytecode processor simply has to know all the instructions and their sizes.

Further, the bytecode uses an operand stack, rather than explicitly encoded operands. The instruction to add two double values, dadd, consists of a single byte opcode only. It’s defined to pop two values from the operand stack, add them, and push the result back. Which means, the preceding instructions determine the actual source of the arguments and the following instruction will determine where to store the result, as discussed in What is the role of operand stack in JVM?

This provides a great flexibility. The preceding instructions might have pushed the contents of local variables, closest equivalent to address CPU registers, but they also could have pushed constant values instead. For both options, there are optimized variants. The linked Q&A names the different variants to address local variables or load int constants. For double constants, there are single byte instructions, dconst_0 and dconst_1, to push the common values zero or one. But you could also use the three byte sequence bipush 100, i2d to encode the value 100.0, for example, which is still more compact than encoding the eight byte value 100.0 directly. Finally, for arbitrary values, the instruction ldc2_w can be used to load them from the constant pool of a class.

Besides constants or values of a variable, the addressing via operand stack allows to directly use the result of a previous calculation. Technically, the bipush 100, i2d example did already use the opportunity. But the Q&A linked above discusses an example of a more complex calculation.


If you keep modelling your instruction set after ARM, it would be more useful to look how ARM software has to use embedded constants, rather than looking at a JVM.