Why doesn't this code produce the same assembly? (g++ -O3)
I know little of assembly but it seems case 2 accessing has less instructions, so should be preferred, right? I am asking this because I wanted to implement a wrapper class with an access operator that returns a pointer int* p = a[i]
(so access is a[i][j]
, instead of a[i*3+j]
), but don't know if it's worth it. Thank you for any help.
#include <iostream>
int main() {
int a[9];
int i, j, k;
// Case 1
std::cin >> i >> j >> k;
*(a + i*3 + j) = k;
std::cin >> i >> j >> k;
(&a[i*3])[j] = k;
std::cin >> i >> j >> k;
*((&a[i*3])+j) = k;
// Case 2
std::cin >> i >> j >> k;
a[i*3 + j] = k;
std::cout << a[0];
return 0;
}
https://godbolt.org/z/13arxcPqz
Edit: For completeness, this change where a
is moved to the right is exactly as in case 2, as the operator+ now associates left.
// Case 2 again
std::cin >> i >> j >> k;
*(i*3 + j + a) = k;
The expressions
*(a + i*3 + j)
anda[i*3 + j]
are not equivalent at the level of C++. Since binary+
associates left-to-right, the former is equivalent to*((a + i*3) + j)
while the latter is equivalent to*(a + (i*3 + j))
. They can produce different results if, for instance, the sum ini*3 + j
would overflowint
.For a concrete example, consider a 64-bit machine with 32-bit
int
like your x86-64 system, and suppose we hadi == 600'000'000
andj == 2'000'000'000
. Suppose, instead of your array of length 9, thata
points into an extremely big array on a 64-bit. The first expression adds1'800'000'000
and then2'000'000'000
toa
, yieldinga+3'800'000'000
. The second adds1'800'000'000+2'000'000'000
first, which overflows and causes undefined behavior. On some compilers, the behavior might be to "wrap around", yieldinga+(-494'967'296)
, a completely different address that is 16 GB away from the other one.The generated assembly reflects this distinction. In the second case, the addition
i*3 + j
is done as plain 32-bit addition, which would wrap around on overflow. Sincej
is in memory, once we geti
in a register, we can use a plainadd r32, m32
instruction to do the addition. But in the first case,i*3 + j
must be done as a 64-bit addition to yield correct pointer arithmetic. Soj
must be sign-extended to 64 bits before adding, and this cannot be done in a single memory-source add instruction. Instead, we first usemovsx r64, m32
to loadj
into a register with sign extension, thenadd r64, r64
to do the 64-bit addition. This explains why it takes an extra instruction.Which of the two "should be preferred" is less about the efficiency and more about whether your code could conceivably be called with arguments that would overflow, and what you want to have happen in that situation. Worry about correct behavior before optimizing.
Just to highlight the code I'm talking about:
*(a + i*3 + j) = k;
is performed at lines 12-13 and 16-20 in the asm code linked in the question:Then the code for the next two versions,
(&a[i*3])[j] = k;
(28-29 and 30-36) and*((&a[i*3])+j) = k;
(44-45 and 48-52) is the same; these also correspond to two "pointer plus index" steps and never do theint
addition.Whereas
a[i*3 + j] = k;
is at lines 60-65: