Why doesn't this code produce the same assembly? (g++ -O3)
I know little of assembly but it seems case 2 accessing has less instructions, so should be preferred, right? I am asking this because I wanted to implement a wrapper class with an access operator that returns a pointer int* p = a[i] (so access is a[i][j], instead of a[i*3+j]), but don't know if it's worth it. Thank you for any help.
#include <iostream>
int main() {
int a[9];
int i, j, k;
// Case 1
std::cin >> i >> j >> k;
*(a + i*3 + j) = k;
std::cin >> i >> j >> k;
(&a[i*3])[j] = k;
std::cin >> i >> j >> k;
*((&a[i*3])+j) = k;
// Case 2
std::cin >> i >> j >> k;
a[i*3 + j] = k;
std::cout << a[0];
return 0;
}
https://godbolt.org/z/13arxcPqz
Edit: For completeness, this change where a is moved to the right is exactly as in case 2, as the operator+ now associates left.
// Case 2 again
std::cin >> i >> j >> k;
*(i*3 + j + a) = k;
The expressions
*(a + i*3 + j)anda[i*3 + j]are not equivalent at the level of C++. Since binary+associates left-to-right, the former is equivalent to*((a + i*3) + j)while the latter is equivalent to*(a + (i*3 + j)). They can produce different results if, for instance, the sum ini*3 + jwould overflowint.For a concrete example, consider a 64-bit machine with 32-bit
intlike your x86-64 system, and suppose we hadi == 600'000'000andj == 2'000'000'000. Suppose, instead of your array of length 9, thatapoints into an extremely big array on a 64-bit. The first expression adds1'800'000'000and then2'000'000'000toa, yieldinga+3'800'000'000. The second adds1'800'000'000+2'000'000'000first, which overflows and causes undefined behavior. On some compilers, the behavior might be to "wrap around", yieldinga+(-494'967'296), a completely different address that is 16 GB away from the other one.The generated assembly reflects this distinction. In the second case, the addition
i*3 + jis done as plain 32-bit addition, which would wrap around on overflow. Sincejis in memory, once we getiin a register, we can use a plainadd r32, m32instruction to do the addition. But in the first case,i*3 + jmust be done as a 64-bit addition to yield correct pointer arithmetic. Sojmust be sign-extended to 64 bits before adding, and this cannot be done in a single memory-source add instruction. Instead, we first usemovsx r64, m32to loadjinto a register with sign extension, thenadd r64, r64to do the 64-bit addition. This explains why it takes an extra instruction.Which of the two "should be preferred" is less about the efficiency and more about whether your code could conceivably be called with arguments that would overflow, and what you want to have happen in that situation. Worry about correct behavior before optimizing.
Just to highlight the code I'm talking about:
*(a + i*3 + j) = k;is performed at lines 12-13 and 16-20 in the asm code linked in the question:Then the code for the next two versions,
(&a[i*3])[j] = k;(28-29 and 30-36) and*((&a[i*3])+j) = k;(44-45 and 48-52) is the same; these also correspond to two "pointer plus index" steps and never do theintaddition.Whereas
a[i*3 + j] = k;is at lines 60-65: