Why is stack memory usage in C++ determined at compile time?

197 Views Asked by At

I first started going down this rabbithole after learning that VLAs (variable length arrays) are not compatible with C++. This is due to the fact that an array of variable length would not have a size that is a constant at compile time, and the C++ compiler would need that in order to compute the size of the stack frame that the program is going to allocate at runtime. I understand that this desire to know such a thing at compile time is why VLA arrays are prohibited (among other reasons).

However, why is there a desire to know such a thing (the size of a stack frame to be allocated) at compile time? Why can't all of the memory needed for the stack be determined and then allocated at runtime? I've heard it explained that determining the size of the stack frame at compile-time can be useful for avoiding stack-overflow exceptions since you know exactly how much memory will be needed for a given function call ahead of time. If we were to allow a VLA on the stack who's size was determined by user input at runtime, they could input a giant number like 100000000000 and then we get a stack overflow exception. Prohibiting the use of a VLA is saving us from doing something like that (or at least that is my understanding).

That's how it was explained to me, but I still don't understand why that means everything has to be determined at compile time. For example, alloca can be used to allocate on the stack at runtime by simply moving the stack pointer. Why can't this be done internally for each new local variable in a function at runtime so we can have VLAs instead of allocating the whole stack frame at once? I feel like the answer is supposed to be "well obviously so you don't end up trying to allocate more memory than you have and cause a stack overflow exception!", but isn't this a risk no matter what technique you employ? I could just try to declare my variable length array on the heap at runtime, but what if I try to allocate more memory than I have available? In this case, I ran into the same issue of biting off more memory than I could chew due to user input, and calculating the memory usage for the program at compile-time can do nothing to save me.

So, is the reason this is done at compile time solely for the sake of efficiency? If it is, what makes it more efficient? Is it just the fact that we're calculating how much memory needs to be allocated for the function call ahead of time rather than at runtime, saving us microseconds each time the function is called?

3

There are 3 best solutions below

5
Mike Nakis On BEST ANSWER

In order to generate instructions that access local variables, the compiler needs to know the offset of each local variable in the stack during compilation time, and of course the offset cannot change during execution.

Since the offset of every local variable must be fixed, and since local variables are stored in sequential locations in the stack, the size of every local variable must be fixed.

That's why variable-length arrays (and variable-length anything for that matter) are difficult to implement on the stack in low-level languages that aim to be highly efficient.

The function alloca() can be used to allocate on the stack at runtime, and that may actually be how variable-length arrays are implemented by those compilers that go above and beyond the standard and offer variable-length arrays. (I do not really know, I am hypothesizing here.)

However:

  1. It is a bit clunky, because alloca() returns a pointer, which means that we now have to have an extra hidden local variable to store that pointer.
  2. It is appreciably more complicated, because we now have one more level of indirection: If you want to access an element in the array, you cannot just use an offset from the base pointer, you have to first go to an offset from the base pointer to obtain the array pointer, and then access the element at an offset from the array pointer.
  3. It is dangerous, because it can result in a stack overflow due to reasons having to do with the usage of, rather than the structure of, the software.
  4. It complicates the sizeof() operator, which must work in a very different way for variable-length-arrays than for anything else.

Appendix: Why the compiler needs to know the offset of each local variable

When emitting the code of a function, the compiler generates (among other things) instructions that access local variables.

An instruction accesses a local variable using the offset of that local variable in the stack. (The offset of the variable from the stack pointer or from the base pointer if a base pointer is used.)

For example:

mov ax, [bp-8]

Here, the register ax is being loaded with the contents of the local variable at offset 8 from the base pointer (bp). The base pointer points at the location which was the top of the stack when the function was entered. This instruction is making use of a special addressing mode for accessing memory at a fixed offset from a certain register. The instruction requires that 8 to be a constant, baked into the instruction, it cannot be something else. (If that 8 was to be a variable, then it would immensely complicate accessing local variables, possibly requiring multiple instructions for each access, with a corresponding performance penalty, and on top of that the compiler would have to be considerably more complicated.)

0
Eric Postpischil On

I understand that this desire to know such a thing at compile time is why VLA arrays are prohibited (among other reasons).

Variable length arrays are not prohibited. The C++ standard does not require that implementations support them. However, the C++ standard allows extensions, and C++ implementations may support variable length arrays.

There is no impediment to implementing variable length arrays, as evidenced by the fact that an older version of the C standard required support and that some C and C++ implementations support them.

I've heard it explained that determining the size of the stack frame at compile-time can be useful for avoiding stack-overflow exceptions since you know exactly how much memory will be needed for a given function call ahead of time.

This is incorrect. A function can call other functions, and the functions called can be recursive and can be called programmatically. So, while the amount of stack memory needed solely for the direct data of the function can be known during compilation (in the absence of variable length arrays), the total memory needed for the function cannot always be known by the compiler. So the absence of variable length arrays is insufficient to avoid stack overflows in this way.

There is less need for variable length arrays in C++ because C++ provides other features for dynamic data structures, so we may conjecture the C++ committee has not found variable length arrays to be sufficiently valuable in C++ to require supporting them.

2
anatolyg On

I think the question you really want to ask is why C++ doesn't have VLA in its spec. I agree with the existing answer that there is a good alternative (vector).

In addition, VLA types mess up the C++ type system. Their size is not a compile-time constant, which is unthinkable in the much more developed C++ type system. How would overload resolution work with VLA types? Could you use sizeof in SFINAE constructs? How would default constructors be generated? I think you could answer all these questions, but doing that would require much thought, inevitably generate bugs and reduce flexibility for future changes. All that for small gain: you could use VLA instead of vector or alloca. Yay. Not.