Why aren't array access and pointer arithmetic equivalent with full optimization?

Question

Why aren't array access and pointer arithmetic equivalent with full optimization?

128 Views Asked by lucmobz At 17 August 2025 at 20:04

Why doesn't this code produce the same assembly? (g++ -O3) I know little of assembly but it seems case 2 accessing has less instructions, so should be preferred, right? I am asking this because I wanted to implement a wrapper class with an access operator that returns a pointer int* p = a[i] (so access is a[i][j], instead of a[i*3+j]), but don't know if it's worth it. Thank you for any help.

#include <iostream>

int main() {
    
    int a[9];
    int i, j, k;

    // Case 1
    std::cin >> i >> j >> k;
    *(a + i*3 + j) = k;

    std::cin >> i >> j >> k;
    (&a[i*3])[j] = k;

    std::cin >> i >> j >> k;
    *((&a[i*3])+j) = k;

    // Case 2
    std::cin >> i >> j >> k;
    a[i*3 + j] = k;

    std::cout << a[0];

    return 0;
}

https://godbolt.org/z/13arxcPqz

Edit: For completeness, this change where a is moved to the right is exactly as in case 2, as the operator+ now associates left.

// Case 2 again
std::cin >> i >> j >> k;
        *(i*3 + j + a) = k;

https://godbolt.org/z/x89453aK4

Original Q&A

There are 2 best solutions below

A M On 20 June 2021 at 08:36

Please check the assemblyoutput carefully. Sorted by colors.

The output is the same.

For the last case the assembly for the IO operation is different.

But basically all the same.

Even, if C++ language and rules require different implementation, the optimzing compiler will produce the optimized same code.

// 1.
        mov     eax, DWORD PTR [rsp+4]
        movsx   rdx, DWORD PTR [rsp+8]
        lea     eax, [rax+rax*2]
        cdqe
        add     rax, rdx
        mov     edx, DWORD PTR [rsp+12]
        mov     DWORD PTR [rsp+16+rax*4], edx
// 2.
        mov     eax, DWORD PTR [rsp+4]
        movsx   rdx, DWORD PTR [rsp+8]
        lea     eax, [rax+rax*2]
        cdqe
        add     rax, rdx
        mov     edx, DWORD PTR [rsp+12]
        mov     DWORD PTR [rsp+16+rax*4], edx
// 3.
        mov     eax, DWORD PTR [rsp+4]
        movsx   rdx, DWORD PTR [rsp+8]
        lea     eax, [rax+rax*2]
        cdqe
        add     rax, rdx
        mov     edx, DWORD PTR [rsp+12]
        mov     DWORD PTR [rsp+16+rax*4], edx

**Nate Eldredge** · Accepted Answer

The expressions *(a + i*3 + j) and a[i*3 + j] are not equivalent at the level of C++. Since binary + associates left-to-right, the former is equivalent to *((a + i*3) + j) while the latter is equivalent to *(a + (i*3 + j)). They can produce different results if, for instance, the sum in i*3 + j would overflow int.

For a concrete example, consider a 64-bit machine with 32-bit int like your x86-64 system, and suppose we had i == 600'000'000 and j == 2'000'000'000. Suppose, instead of your array of length 9, that a points into an extremely big array on a 64-bit. The first expression adds 1'800'000'000 and then 2'000'000'000 to a, yielding a+3'800'000'000. The second adds 1'800'000'000+2'000'000'000 first, which overflows and causes undefined behavior. On some compilers, the behavior might be to "wrap around", yielding a+(-494'967'296), a completely different address that is 16 GB away from the other one.

The generated assembly reflects this distinction. In the second case, the addition i*3 + j is done as plain 32-bit addition, which would wrap around on overflow. Since j is in memory, once we get i in a register, we can use a plain add r32, m32 instruction to do the addition. But in the first case, i*3 + j must be done as a 64-bit addition to yield correct pointer arithmetic. So j must be sign-extended to 64 bits before adding, and this cannot be done in a single memory-source add instruction. Instead, we first use movsx r64, m32 to load j into a register with sign extension, then add r64, r64 to do the 64-bit addition. This explains why it takes an extra instruction.

Which of the two "should be preferred" is less about the efficiency and more about whether your code could conceivably be called with arguments that would overflow, and what you want to have happen in that situation. Worry about correct behavior before optimizing.

Just to highlight the code I'm talking about: *(a + i*3 + j) = k; is performed at lines 12-13 and 16-20 in the asm code linked in the question:

        mov     eax, DWORD PTR [rsp+4]        ; eax = i, zero-extend
        movsx   rdx, DWORD PTR [rsp+8]        ; rdx = (int64_t)j, sign-extend to 64 bits
        ;;; lea     rsi, [rsp+4]              ; unrelated, set up args for next cin
        ;;; mov     edi, OFFSET FLAT:_ZSt3cin ; unrelated, set up args for next cin
        lea     eax, [rax+rax*2]              ; eax = i*3, still 32 bits
        cdqe                                  ; rax = (int64_t)i*3, sign-extended
        add     rax, rdx                      ; rax = (int64_t)(i*3) + (int64_t)j
        mov     edx, DWORD PTR [rsp+12]       ; edx = k
        mov     DWORD PTR [rsp+16+rax*4], edx ; perform the store

Then the code for the next two versions, (&a[i*3])[j] = k; (28-29 and 30-36) and *((&a[i*3])+j) = k; (44-45 and 48-52) is the same; these also correspond to two "pointer plus index" steps and never do the int addition.

Whereas a[i*3 + j] = k; is at lines 60-65:

        mov     eax, DWORD PTR [rsp+4]        ; eax = i
        mov     edx, DWORD PTR [rsp+12]       ; edx = k
        lea     eax, [rax+rax*2]              ; eax *= 3
        add     eax, DWORD PTR [rsp+8]        ; eax += j (32 bit add!)
        cdqe                                  ; rax = (int64_t)(i*3+j)
        mov     DWORD PTR [rsp+16+rax*4], edx ; do the store

Why aren't array access and pointer arithmetic equivalent with full optimization?

There are 2 best solutions below

Related Questions in C++

Related Questions in ASSEMBLY

Related Questions in C++20

Related Questions in POINTER-ARITHMETIC

Related Questions in ARRAYACCESS

Trending Questions

Popular # Hahtags

Popular Questions