Initializing array in RISC-V. How much space does it need?

115 Views Asked by At

This probably has a very obvious answer, but I cannot find out why, when initializing an array within a function in RISC-V, the memory allocated to the array is (number of items)*(size of item)+15 and not just (number of items)*size of items.

Furthermore I don't understand why you have to perform an andi (size of array in bytes),(size of array in bytes), -16.

So I guess I do not understand how much space is taken up by arrays. An example is provided below.

Example: take this C function:

int sum(int n, int m) {
    if (n<0) return 0;
    int array[n];
    int result=0;
    for (int i=0; i<n; i++){
        array[i]=m;
    }
    for(int a=0;a<n;a++){
        result+=array[a];
    }
    return result;
}

In RISC-V (translated by Compiler Explorer with the -O1 optimization): I do not understand why lines 7 and 8 of the following code are there (you can ignore the rest of the code; my only concerns are with lines 7 and 8).

sum(int, int):
        blt     a0,zero,.L5
        addi    sp,sp,-16
        sw      s0,12(sp)
        addi    s0,sp,16
        slli    a5,a0,2
        addi    a5,a5,15  //**WHY??**
        andi    a5,a5,-16  //**WHY??**
        sub     sp,sp,a5
        mv      a4,sp
        ble     a0,zero,.L6
        mv      a3,a4
        li      a5,0
.L3:
        sw      a1,0(a4)
        mv      a2,a5
        addi    a5,a5,1
        addi    a4,a4,4
        bne     a0,a5,.L3
        li      a5,0
        li      a0,0
.L4:
        lw      a4,0(a3)
        add     a0,a0,a4
        mv      a4,a5
        addi    a5,a5,1
        addi    a3,a3,4
        bne     a2,a4,.L4
.L1:
        addi    sp,s0,-16
        lw      s0,12(sp)
        addi    sp,sp,16
        jr      ra
.L5:
        li      a0,0
        ret
.L6:
        li      a0,0
        j       .L1
2

There are 2 best solutions below

0
Erik Eidt On

From Chapter 18. Calling Convention:

"In the standard RISC-V calling convention, the stack grows downward and the stack pointer is always kept 16-byte aligned."

And from RISC-V Calling Conventions:

"The stack grows downwards (towards lower addresses) and the stack pointer shall be aligned to a 128-bit boundary upon procedure entry. The first argument passed on the stack is located at offset zero of the stack pointer on function entry; following arguments are stored at correspondingly higher addresses."

This apparently applies to both RV32 and RV64.

In any case, those computations are for stack pointer/frame alignment.

Integers (int) are 4 bytes, but the stack pointer needs to be 16 byte aligned, so for example, if n is < 4 then we really need to allocate 4 ints, so that we get 16 byte alignment.

Since they are rounding up to include alignment, and, 16 byte alignment has better cache line boundary alignment, plus its easy to use, the compilers are locating the array at the aligned location and any necessary padding goes after the array in memory (the alternative would be to compute the location of the array on the stack first, then add alignment with padding in front of the array).

The initial addi #15/#16 (gcc vs. clang) rounds the size need upwards.  The final andi #-16 operation "floors" the address computation and ensures 16-byte alignment to the result.  (These operations work on a stack that grows downward — a stack growing upward or the same for a heap allocated object would require slightly modified formula, though heap objects are already aligned as needed, I believe, so that might only apply if you needed even more alignment that 16 byte).

If a frame pointer is used, it is supposed to be s0 aka x8, so that's what GCC and Clang are doing with s0.  Using -fomit-frame-pointer doesn't remove that, looks like they both want to use the frame pointer to restore the stack (deallocate the variable length local array).

0
Chris Dodd On

These two lines

    addi    a5,a5,15  //**WHY??**
    andi    a5,a5,-16  //**WHY??**

round the value in a15 up to a multiple of 16—the fist line adds 15, and the second line clears the bottom 4 bits (which effectively rounds down to the nearest multiple of 16).

The reason for this is alignment—RISC-V requires that the stack be 16-byte aligned at all times, so any variably-sized object allocated on the stack needs to have its size rounded up to a multiple of 16.