Today I discovered that GCC does some amazing magic for optimizing switches in this code:
StairsType GetStairsType(uint8_t tileId, uint8_t dlvl)
{
if (dlvl == 0)
return StairsType::Part;
if (tileId == 48) {
return dlvl >= 21 ? /* Crypt */ StairsType::Down : /* Caves */ StairsType::Part;
}
switch (tileId) {
case 57: return StairsType::Down;
// many more tile IDs go here
}
}
Have a look at this :
https://godbolt.org/z/snY3jv8Wz
(gcc is on the left, clang is on the right)
Somehow GCC manages to compile this to 1/4 of the code compared to Clang.
- What does this asm do, conceptually?
- How does GCC do this?
Both compilers compile this part in the obvious way:
When it comes to this part:
gcc checks
tileId==48
directly, while clang decides to merge it into the switch statement.Both compilers decide to calculate
dlvl >= 21 ? 1 : 4
branchlessly in more-or-less the same way, but with different exact instructions. GCC sneakily uses the carry flag to calculate((dlvl >= 21 ? 0 : -1) & 3) + 1
while clang straightforwardly calculates((dlvl < 21) * 3) + 1
.When it comes to the
switch
statement, clang implements it with a jump table. The lowest entry is 16 and the highest is 160, so it subtracts 16 and then checks whether the number is greater than 144. There's a well-known trick to save one check, where numbers smaller than 16 wrap around to very big numbers, so they're greater than 144. For whatever reason, it chooses to preload the number 1 (StairsType::Down
) into the return value register before doing the jump.Meanwhile on gcc, it first checks if the tile ID is between 16 and 78. If so, it subtracts 16 and checks some bitmasks:
Let's try that again, but with the bitmasks in binary and let's figure out what those return statements return:
Seems that the compiler noticed that you grouped your tile IDs into somewhat logical groups. E.g. above 130 there are only shortcuts to town.
I have no idea how compiler writers come up with this stuff.