Taking sizeof of variable-length array — is there any benefit for doing so?

166 Views Asked by At

I am working on a piece of legacy code (no tests). I stumbled upon a section hidden inside several macros. It generates a warning if compiled with GCC's -Wvla.

The code in question is equivalent to what can be seen in this small program:

typedef struct entry {
    unsigned index;
    unsigned reserved;
    unsigned value;
} entry_t;

int main(int argc, char **argv) {
    long pa = 0;
    long res = pa + sizeof(entry_t[10 - argc]);
    return res;
}

When compiled, it gives out a warning:

$ gcc -g -Wvla repro-vla.c
repro-vla.c: In function ‘main’:
repro-vla.c:9:5: warning: ISO C90 forbids variable length array [-Wvla]
    9 |     long res = pa + sizeof(entry_t[10 - argc]);
      |     ^~~~

The culprit is of course this expression: sizeof(entry_t[10 - argc]). The syntax is a bit confusing here. I believe a temporary anonymous array for 10 - argc entries of type entry_t is created, then its size is taken, and the array is discarded.

My questions are:

  1. Is my understanding of the code as it is written now correct?
  2. How is that expression any different from sizeof(entry_t) * (10-argc)? Both calculate the same value, and neither one does anything to guard against the underflow (when argc >= 10). The second expression does not use variable length arrays and as such won't generate the warning, and it is also easier to comprehend in my opinion.
3

There are 3 best solutions below

1
tstanisl On BEST ANSWER

No temporary array object is created.

There is a common misconception that VLA is about stack allocated arrays of length defined in runtime, a form of a bit safer alloca().

No. VLAs are about typing not storage. The following line is the essence of "VLA-ness":

typedef int T[n];

not:

int A[n];

Note that VLA type declarations do not allocate any storage and the can be used without any concerns of running out of stack. The VLA types are very useful for handling multidimensional arrays and expressing access ranges in functions arguments i.e. void foo(int n, int arr[n]);. VLA types were optional in C11 but they will be mandatory in C23 due to their usefulness.

The expression sizeof(entry_t[10 - argc] is essentially the same as:

typedef entry_t _unnamed_type[10 - argc];
sizeof(_unnamed_type)

No VLA array object is created there. I think that the problem is -Wvla flag itself. The -Wvla warns about any declaration of VLA type (not VLA object) what is overzealous because it also catches the good usages of VLA types. There is a request to add -Wvla-stack-allocation warning to clang to catch dangerous usages of VLAs. Alternatively, one can use gcc's -Wvla-larger-than=0 but it does not work very well.

0
Vlad from Moscow On

In this expression sizeof(entry_t[10 - argc]) no array is created. It evaluates the expression 10 - argc and according to the type of entry_t it calculates the size of such an array. entry_t[10 - argc] is a type specifier and not an expression. So for example you may not write

sizeof entry_t[10 - argc]

These expressions sizeof(entry_t[10 - argc]) and sizeof(entry_t) * (10-argc) yield the same value because the size of an array is equal to the number of elements in the array multiplied by the size of its element provided that 10 is greater than argc.

Regarding the number of elements in a VLA, there is a restriction in the C Standard that is "each time it is evaluated it shall have a value greater than zero."

Note that in the expression sizeof(entry_t) * (10-argc) you can get a very big value of the type size_t when argc is greater than 10 due to the usual arithmetic conversions.

Also this line

long res = pa + sizeof(entry_t[10 - argc]);

raises a question why a value of the unsigned integer type size_t is assigned to a variable of the signed integer type long.

2
chux - Reinstate Monica On

Taking sizeof of variable-length array — is there any benefit for doing so?

Benefit with the alterative code

sizeof(entry_t) * (10-argc) has a defined product*1 when argc >= 10. sizeof(entry_t[10 - argc]) is undefined behavior (UB) in that case as it fails to meet "If the size is an expression that is not an integer constant expression: ... each time it is evaluated it shall have a value greater than zero." C17dr § 6.7.6.2 5 (Array declarators)

The next steps of code, converting to long and then to int, have their implementation defined behavior when sizeof(entry_t) * (10-argc) is more than LONG_MAX, INT_MAX.

Small advantage

Given UB of argc >= 10, an optimizing compiler can assume argc < 10 and the emit code taking advantage that the return value will only be in a narrow range (e.g.: [3*4 ... 3*4*9]) - thus only needing to use narrow integer math. Not much of an advantage - but there it is.


*1 Certainly type size_t.