Let's start from a simple example of stack allocation:
void f() {
int a, b;
...
}
If I understand correctly. Then address of a and b has a fixed offset from stack base, namely the register ebp. That's the how compiler find them if we need them in afterwards.
But consider the following code.
void f(int n) {
int a;
alloca(n);
int b;
...
}
If compiler do not do any optimization, the stack would be a->n->b. Now offset of b is dependent on n. Then what did compiler do?
Mimicking How does alloca() work on a memory level? . I tried the following code:
#include <stdio.h>
#include <alloca.h>
void foo(int n)
{
int a;
int *b = alloca(n * sizeof(int));
int c;
printf("&a=%p, b=%p, &c=%p\n", (void *)&a, (void *)b, (void *)&c);
}
int main()
{
foo(5);
return 0;
}
The output is &a=0x7fffbab59d68, b=0x7fffbab59d30, &c=0x7fffbab59d6c. This time address of a and c looks neighboring. Did compiler do some reordering? And if we do not allow compiler to reorder how would compiler find address of c?
------------Some Update------------
Alright, as long as you believe compilers really matters, let's try x86-64 gcc 13.2, I've modify the code a little bit.
#include <alloca.h>
void alloca_test(int n) {
int a;
int* ptr = (int *) alloca(n);
int b;
a++;
b++;
ptr[0]++;
}
and here's the assembly:
alloca_test(int):
push rbp
mov rbp, rsp
sub rsp, 48
mov DWORD PTR [rbp-36], edi
mov DWORD PTR [rbp-4], 0
mov eax, DWORD PTR [rbp-36]
cdqe
lea rdx, [rax+8]
mov eax, 16
sub rax, 1
add rax, rdx
mov ecx, 16
mov edx, 0
div rcx
imul rax, rax, 16
sub rsp, rax
mov rax, rsp
add rax, 15
shr rax, 4
sal rax, 4
mov QWORD PTR [rbp-16], rax
mov DWORD PTR [rbp-20], 0
add DWORD PTR [rbp-4], 1 <--a++
add DWORD PTR [rbp-20], 1 <--b++
mov rax, QWORD PTR [rbp-16]
mov eax, DWORD PTR [rax]
lea edx, [rax+1]
mov rax, QWORD PTR [rbp-16]
mov DWORD PTR [rax], edx
nop
leave
ret
Here b has address [rbp-20], which don't change w.r.t n
Actually, for a really non-optimizing compiler, it would be the other way around.
Let's imagine a naive, 1970s-tech compiler, since the C language was originally designed to be implemented by compilers like that. And let's ignore the
allocacall for the moment. As the compiler parses the function, each time it encounters a definition of a local variable [*], it assigns it a stack slot, relative to the frame pointerebp: soaat[ebp-4],bat[ebp-8]and so on, and uses that to address the variable. When it finishes parsing, it has seen all the local variables and can compute the total amount of stack needed, and so insert the appropriate constant in the prologue code that adjusts the stack pointer (e.g.sub esp, 8). Even a very simple one-pass compiler that emits code line by line can do this, by emitting something likesub esp, 0in the prologue and then back-patching later to replace the immediate0by the correct value.Now, as for
alloca, it wasn't part of the original C language. Rather, it was basically a "cool hack" that someone discovered, that can be implemented in a way that is completely orthogonal to the process above. We can describe the idea as follows in x86 terms. (The original implementation would have been for PDP-11 or VAX or something like that, but the idea is the same.) Since all the locals are addressed relative toebp, then it doesn't matter if the stack pointerespis decremented further during the execution of the function; the compiler never refers toesp. And the stack cleanup in the function epilogue is normally implemented asmov esp, ebpinstead ofadd esp, 8, so it will also continue to work fine.So in fact the compiler doesn't even have to know that
alloca(n)does anything special with regard to the stack. It can be a macro that expands to inline assembly likesub esp, [ebp+8] / mov eax, esp, which, again, can be opaque to the compiler other than filling in an addressing mode forn(in this case the first stack arg, at[ebp+8]assuming traditional use of EBP as a frame pointer). It would play no role at all in the process of allocating stack slots for local variables, since this hypothetical compiler will only ever access locals relative to a frame pointer (EBP), not ESP.In fact, if you are willing to do a little more stack juggling,
allocacan even be an external library function, that the compiler just treats like any other function call. This is whyallocahas the syntax of a function call, rather than being integrated more deeply into the language - the original implementation was simply a function call.So with this implementation, we'd have
aandbat the top of the stack frame (allocated simultaneously in the function prologue), and theallocaed buffer below them (allocated at the point of theallocacall). If we had more calls toalloca, then they would return progressively lower addresses in the order they were executed, subtracting from the stack pointer each time.An
allocathat the compiler didn't understand could break things if done in the middle of pushing args for a call to another function, so presumablybar(1, 2, alloca(n), 3)wasn't safe with early hacky implementations.Now as you might imagine, this "cool hack" didn't work so well once compilers got smarter and wanted to be able to actually control the stack pointer throughout the execution of the function. Then changing the stack pointer behind the compiler's back, as it were, becomes disastrous, so
allocahad to be given special compiler support, causing a lot of pain for compiler writers. The idea ofallocawas eventually fully adopted into the language when C99 introduced variable-length arrays, but many observers regard that decision as a mistake.So as for what a modern compiler does, there's not necessarily any single answer. It knows exactly what semantics
allocaneeds to provide, and it can make its own decisions as to how to implement that. It's not necessarily constrained to usingebp-relative addressing for all locals; it can useesp-relative, or compute the address some other way, or maybe just optimize variables into registers so that they don't occupy a stack slot at all. So we can't easily predict what the stack layout will look like.[*] Note, in passing, that in assigning stack slots and computing stack usage, the block structure of the function would typically be flattened, and the scopes of the variables would not be taken into account. So even in code like
Here
a,b,cwill each get their own stack slots, say[ebp-4],[ebp-8],[ebp-12]respectively. The stack pointer will be adjusted by 12 in the prologue. In particular, the compiler will not emit instructions to readjust the stack pointer at the beginning and end of each{ }blocks. It could, however, optimize by noticing that the scopes ofbandcdo not overlap, and so it could assign[ebp-8]to both of them, and use only 8 bytes of stack instead of 12.