Performance of alloca

550 Views Asked by At

I'm using alloca a lot these days, to allocate temporary buffers. In my application (signal processing) this is a common need.

The question is:

When allocating multiple arrays, is it better (performance-wise) to use alloca just once?

Like this:

float *array1 = (float*)alloca(4096 * 4);
float *array2 = array1 + 1024;
float *array3 = array2 + 1024;
float *array4 = array3 + 1024;

Or use it multiple times like this:

void *array1 = (float*)alloca(4096);
void *array2 = (float*)alloca(4096);
void *array3 = (float*)alloca(4096);
void *array4 = (float*)alloca(4096);

I mean all it probably does is decrease the stack pointer and probably do the "stack probe", which depends on the size, so it perhaps doesn't matter?

2

There are 2 best solutions below

0
On

alloca is made to be faster than malloc for reasons related with the way the allocation/deallocation is performed, and the sections of memory used, as I'm sure you know. It's also, as stated in the comments, very easy to get wrong.

To the point, my guess would be that the first version, repeating alloca, would be faster, in an unoptimized setting, than playing with indirection, and indeed, after some benchmarks this was confirmed:

enter image description here

The tests were performed using google benchmark, clang 10.0, C++20 std and no optimization. The tests were repeated with constant results using function runs with code similar to that of the OP:

#include <alloca.h>

void alloc1(){
    float *array1 = (float*)alloca(4096 * 4);
    float *array2 = array1 + 1024;
    float *array3 = array2 + 1024;
    float *array4 = array3 + 1024;
}

void alloc2(){
    void *array1 = (float*)alloca(4096);
    void *array2 = (float*)alloca(4096);
    void *array3 = (float*)alloca(4096);
    void *array4 = (float*)alloca(4096);
}

static void alloca1_test(benchmark::State& state) {
    for (auto _ : state) {
        alloc1();
        //benchmark::DoNotOptimize();
    }
}
BENCHMARK(alloca1_test);

static void alloca2_test(benchmark::State& state) {
    for (auto _ : state) {
        alloc2();
        //benchmark::DoNotOptimize();
    }
}
BENCHMARK(alloca2_test);

Whith O3 otpimization added, as one would expect, the test results will even out, the multiple alloca still consistently slightly faster but the differences in performance are negligible. As you stated, it's basicaly the same. To use one or the other seems to make little to no difference.

enter image description here

Disclaimer:

To best understand the performance of your program, integrated testing would give you a more accurate reading than isolated testing like done here. The buils tools as well as the environment will also affect the end result, to fully and accurately measure the performance of your options you must test them yourself.

1
On

The usefulness of alloca is if you don't know the size of your array upfront.

For the given code, it is not different from simply writing:

float array1[4096];
float array2[4096];
float array3[4096];
float array4[4096];

Frankly, I don't see the need for benchmarking; alloca just bumps a stack pointer to make space for your allocation, just like declaration of those arrays does.