How does GCC optimize this `switch`

357 Views Asked by At

Today I discovered that GCC does some amazing magic for optimizing switches in this code:

StairsType GetStairsType(uint8_t tileId, uint8_t dlvl)
{
    if (dlvl == 0)
        return StairsType::Part;
    if (tileId == 48) {
        return dlvl >= 21 ? /* Crypt */ StairsType::Down : /* Caves */ StairsType::Part;
    }
    switch (tileId) {
    case 57: return StairsType::Down;
    // many more tile IDs go here
    }
}

Have a look at this :

https://godbolt.org/z/snY3jv8Wz

(gcc is on the left, clang is on the right)

Somehow GCC manages to compile this to 1/4 of the code compared to Clang.

  1. What does this asm do, conceptually?
  2. How does GCC do this?
1

There are 1 best solutions below

2
On BEST ANSWER

Both compilers compile this part in the obvious way:

    if (dlvl == 0)
        return StairsType::Part;

When it comes to this part:

    if (tileId == 48) {
        // This ID is used by both Caves and Crypt.
        return dlvl >= 21 ? /* Crypt */ StairsType::Down : /* Caves */ StairsType::Part;
    }

gcc checks tileId==48 directly, while clang decides to merge it into the switch statement.

Both compilers decide to calculate dlvl >= 21 ? 1 : 4 branchlessly in more-or-less the same way, but with different exact instructions. GCC sneakily uses the carry flag to calculate ((dlvl >= 21 ? 0 : -1) & 3) + 1 while clang straightforwardly calculates ((dlvl < 21) * 3) + 1.

// GCC
cmp     sil, 21
sbb     eax, eax
and     eax, 3
add     eax, 1

// clang
xor     eax, eax
cmp     sil, 21
setb    al
lea     eax, [rax + 2*rax]
inc     eax

When it comes to the switch statement, clang implements it with a jump table. The lowest entry is 16 and the highest is 160, so it subtracts 16 and then checks whether the number is greater than 144. There's a well-known trick to save one check, where numbers smaller than 16 wrap around to very big numbers, so they're greater than 144. For whatever reason, it chooses to preload the number 1 (StairsType::Down) into the return value register before doing the jump.

Meanwhile on gcc, it first checks if the tile ID is between 16 and 78. If so, it subtracts 16 and checks some bitmasks:

if(tileID < 16) {
    return 0;
} else if(tileID <= 78) {
    mask = (1 << (tileID - 16));
    if(2307251517974380578 & mask) return 2;
    if(4611688220747366401 & mask) return 1;
    return ((421920768 & mask) != 0) << 2;
} else {
    tileID += 125; // The same as tileID -= 131
    // because it's being treated as a byte
    if(tileID > 29) { // values lower than 131 wrapped around
        return 0;
    }

    // notice that if we get here it means the tileID is >= 131 and <= 160
    // << only looks at the bottom 5 bits of tileID (i.e. tileID & 31)
    return -((541065279 & (1 << tileID)) != 0) & 3;
}

Let's try that again, but with the bitmasks in binary and let's figure out what those return statements return:

if(tileID < 16) {
    return StairsType::Invalid;
} else if(tileID <= 78) {
    mask = (1 << (tileID - 16));
    // tile 17, 21, 35, 36, 38, 51, 56, 64, 66, 77
    if(0010000000000101000000010000100000000000010110000000000000100010 & mask) return StairsType::Up;
    // tile 16, 39, 40, 46, 47, 57, 78
    if(0100000000000000000000100000000011000100100000000000000000000001 & mask) return StairsType::Down;
    // tile 33, 34, 37, 40, 43, 44
    return ((00011001001001100000000000000000 & mask) != 0) ? StairsType::Part : StairsType::Invalid;
} else {
    if(tileID < 131 || tileID > 160) {
        return 0;
    }

    // tile 131, 132, 133, 134, 135, 136, 153, 160
    return (00100000010000000000000000111111 & (1 << (tileID - 131))) ? StairsType::ShortcutToTown : StairsType::Invalid;
}

Seems that the compiler noticed that you grouped your tile IDs into somewhat logical groups. E.g. above 130 there are only shortcuts to town.

I have no idea how compiler writers come up with this stuff.