This probably has a very obvious answer, but I cannot find out why, when initializing an array within a function in RISC-V, the memory allocated to the array is (number of items)*(size of item)+15 and not just (number of items)*size of items.
Furthermore I don't understand why you have to perform an andi (size of array in bytes),(size of array in bytes), -16.
So I guess I do not understand how much space is taken up by arrays. An example is provided below.
Example: take this C function:
int sum(int n, int m) {
if (n<0) return 0;
int array[n];
int result=0;
for (int i=0; i<n; i++){
array[i]=m;
}
for(int a=0;a<n;a++){
result+=array[a];
}
return result;
}
In RISC-V (translated by Compiler Explorer with the -O1 optimization): I do not understand why lines 7 and 8 of the following code are there (you can ignore the rest of the code; my only concerns are with lines 7 and 8).
sum(int, int):
blt a0,zero,.L5
addi sp,sp,-16
sw s0,12(sp)
addi s0,sp,16
slli a5,a0,2
addi a5,a5,15 //**WHY??**
andi a5,a5,-16 //**WHY??**
sub sp,sp,a5
mv a4,sp
ble a0,zero,.L6
mv a3,a4
li a5,0
.L3:
sw a1,0(a4)
mv a2,a5
addi a5,a5,1
addi a4,a4,4
bne a0,a5,.L3
li a5,0
li a0,0
.L4:
lw a4,0(a3)
add a0,a0,a4
mv a4,a5
addi a5,a5,1
addi a3,a3,4
bne a2,a4,.L4
.L1:
addi sp,s0,-16
lw s0,12(sp)
addi sp,sp,16
jr ra
.L5:
li a0,0
ret
.L6:
li a0,0
j .L1
From Chapter 18. Calling Convention:
And from RISC-V Calling Conventions:
This apparently applies to both RV32 and RV64.
In any case, those computations are for stack pointer/frame alignment.
Integers (
int) are 4 bytes, but the stack pointer needs to be 16 byte aligned, so for example, ifnis < 4 then we really need to allocate 4ints, so that we get 16 byte alignment.Since they are rounding up to include alignment, and, 16 byte alignment has better cache line boundary alignment, plus its easy to use, the compilers are locating the array at the aligned location and any necessary padding goes after the array in memory (the alternative would be to compute the location of the array on the stack first, then add alignment with padding in front of the array).
The initial
addi #15/#16(gcc vs. clang) rounds the size need upwards. The finalandi #-16operation "floors" the address computation and ensures 16-byte alignment to the result. (These operations work on a stack that grows downward — a stack growing upward or the same for a heap allocated object would require slightly modified formula, though heap objects are already aligned as needed, I believe, so that might only apply if you needed even more alignment that 16 byte).If a frame pointer is used, it is supposed to be
s0akax8, so that's what GCC and Clang are doing withs0. Using -fomit-frame-pointer doesn't remove that, looks like they both want to use the frame pointer to restore the stack (deallocate the variable length local array).