Recently I have been extremely interested in language development, I've got multiple working front ends and have had various systems for executing the code. I've decided I would like to try to develop a virtual machines type system. (Kind of like the JVM but much simpler of course) So I've managed to create a basic working instruction set with a stack and registers but I'm just curious about how some things should be implemented.
In Java for example after you've written a program you compile it with the java compiler and it creates a binary (.class) for the JVM to execute. I don't understand how this is done, how does the JVM interpret this binary, what's the transition from human readable instructions to this binary, how could I create something similar?
Thanks for any help/suggestions!
Alright, I'll bite on this generic question.
Implementing an compiler/assembler/vm combo is a tall order, especially if you're doing it by yourself. That being said: If you keep your language specification simple enough, it is quite doable; also by yourself.
Basically, to create a binary, the following is done (this is a tad bit simplified*:
1) Input source is read, lexed, and tokenized
2) The program logic is analyzed for semantical correctness.
E.g. while the following C++ would parse & tokenize, it would fail semantic analysis
3) Build an Abstract Syntax Tree to represent the statements
4) Build symbol tables and resolve identifiers
5) Optional: Optimization of code
6) Generate code in an output format of your choice; for example binary opcodes/operands, string tables. Whatever format suits your needs best. Alternatively, you could create bytecode for an existing VM, or for a native CPU.
EDIT If you want to devise your own bytecode format, you can write, for example:
END
Overall, the steps are managable, but, as always, the devil is in the details.
Some good references are:
The Dragon Book - This is heavy on theory, so it's a dry read, but worthwhile
Game Scripting Mastery - Guides you along while developing all three components in a more practical matter. However, the example code is rife with security issues, memory leaks, and overall lousy coding style (imho). However, you can take a lot of concepts away from this book, and it's worth a read.
The Art of Compiler Design - I have not read this one personally, but heard positive things about it.
If you decide to go down this road, be sure you know what you're getting yourself into. This is not something some the faint of heart, or someone new to programming. It requires a lot of conceptual thinking and prior planning. It is, however, quite rewarding and fun