How to encourage undefined behavior / out-of-order execution in C programs?

107 Views Asked by At

I am reading the following article about sequence points in C: https://www.geeksforgeeks.org/sequence-points-in-c-set-1/

In it, there are several examples of undefined behavior, such as expressions that call two functions that modify a single global variable, or a single expression that increments the same variable more than once.

In theory, I understand the concept. However, no matter how many times I try to run the examples, the behavior is the same, and never "surprising."

For the purpose of getting a hands-on appreciation of undefined behavior, what's the easiest way to get the examples to be "surprising"?

(If it matters, I am using MINGW64.)

2

There are 2 best solutions below

0
On

This is about the best I can come up with at short notice:

Source code:

#include <stdio.h>

int undefined(int *a, short *b)
{
    *a = 1;
    b[0] = 0;
    b[1] = 0;
    return *a;
}

int main()
{
    int x;
    short *y = (short*) &x;
    int z = undefined(&x, y);
    printf("%d\n", z);
    return 0;
}

Resulting assembly using gcc 8.3 -O3

undefined(int*, short*):
    mov     DWORD PTR [rdi], 1
    mov     eax, 1
    mov     DWORD PTR [rsi], 0
    ret
.LC0:
    .string "%d\n"
main:
    sub     rsp, 8
    mov     esi, 1
    mov     edi, OFFSET FLAT:.LC0
    xor     eax, eax
    call    printf
    xor     eax, eax
    add     rsp, 8
    ret

See it in action: https://godbolt.org/z/E0XDYt

In particular, it relies on the undefined behavior caused by casting the address of an int to a short*, an action that breaks the strict aliasing rule, and therefore causes undefined behavior.

Start with the assembly of undefined(). That assumes that since a and b are different types, they cannot overlap, therefore it optimizes the return *a; into mov eax,1, even though it would actually return zero if it fetched the value from memory. Which it does with optimization off, so this is one of those really insidious problems that only manifests in an optimized release build, and not when you try and debug it with a non-optimized debug build.

However, note how the code in main() does try and get it right: it inlines, and then optimizes away the call to undefined() and instead assumes 0 in z when it does the xor eax,eax just above the call to printf. So it's ignoring what it just figured out as the return value a few lines above, and is instead using a different value.

All in all, a very badly broken program. Exactly what you risk with undefined behavior.

0
On

A useful pattern when testing gcc and clang is to access arrays using subscripts whose values would be in bounds, but are not known to the compiler, and use the pointer syntax which the Standard describes as equivalent to subscript notation. Testing gcc and clang with something like:

struct S1 {int x;};
struct S2 {int x;};

union foo { struct S1 arr1[8]; struct S2 arr2[8]; } u;

uint32_t test1(int i, int j)
{
  if (sizeof u.arr1 != sizeof u.arr2)
    return -99;
  if (u.arr1[i].x)
    u.arr2[j].x = 2;
  return u.arr1[i].x;
}
uint32_t test2(int i, int j)
{
  if (sizeof u.arr1 != sizeof u.arr2)
    return -99;
  if ((u.arr1+i)->x)
    (u.arr2+j)->x = 2;
  return (u.arr1+i)->x;
}

will reveal that although the Standard defines the behavior of u.arr1[i].x and u.arr2[j].x as equivalent to (u.arr1+i)->x and (u.arr2+j)->x, respectively, gcc and clang miss an allowable optimization opportunity when given the former that they exploit when given the latter. Most likely this is because the authors recognize that exploiting the former opportunity would be permissible but would be so undeniably silly that as to compel recognition that the Standard was never intended to encourage all the optimizations it allows.