What's an efficient way to store variables? (Home-Built Virtual Machine)

471 Views Asked by At

Me and a friend of mine are thinking about writing our own Programming Language, using just-in-time compilation. We agreed about the Assembly we are going to use but one thing we aren't quite sure on is how to store variables. What we did agree on is the structure of these.

Variables will be replaced by keys (during compilation). Every key is a 2 byte integer ranging from 1 to 65535. When you have for example a variable inside a namespace, the key will exist out of first a 2 byte integer containing the key of the namespace, and than a 2 byte integer containing the key of the actual variable.

So for example if I have namespace foo and I have a variable test in it, and we say that namespace foo will be assigned key 1, and variable test inside 1 will be assigned key 1->1. (First variable in first namespace). In the Assembly itself, we terminate these keys with a NULL byte. (Keep in mind this is the compiled Assembly rather than the real code before compilation)

GETV 1 1 0 SET 5 RET

This Assembly will get variable test out of namespace foo, and set it to 5. It'll then return that variable.

GETV 1 2 1 0 SETV 1 1 0 RET

This assembly could match the following (fictional) code:

foo::testClass::test = foo::test;
return foo::test;

Providing the following structure is given.

namespace foo { // 1 First Global Variable
    byte test = 1; // 1 1 - First Variable Inside First Global Variable
    class testClass { // 1 2 - Second Variable Inside First Global Variable
        static byte test = 0; // 1 2 1 - First Variable Inside Second Variable Inside First Global Variable
    }
}

How would I go about accessing these variables? My current plan was to store them inside a hashmap using the key as string as hash. I don't have any idea how to go about doing this though as how would I know what type of variable is stored in that current key, how long it is and how to do calculations with it. I do understand that preventing mad calculations like adding unsigned integers to signed ones can be handles by the compiler, but that still leaves us with the problem, how long is that variable, and how to handle it. (Adding 2 floats would be handled differently than adding 2 integers, right?)

1

There are 1 best solutions below

5
On

The best approach here is not to keep some strange identifiers for your variables but to use direct pointers. Once the program is compiled you will not need human-centric names anymore.

What is more important, you need to think about the structure of your variables. Depending on the syntax of your language, besides of the memory that to keep the value of your variables, you may need some metadata to be stored as well - the type of the variable, for example. This information is needed only if you want to support automatic type casting. If your language is strictly typed, you will be able to resolve all type conflicts in compile time and then you will not need type information in run time.

Also, depending on the syntax, you may need to keep an index that to map the human readable names of the variables to the actual addresses. This index is needed only if your language has functions similar to:

var_by_name(s:string):pointer