Disable named return value optimization in gcc for pure C

345 Views Asked by At

I failed to find a flag that controls the named return value optimization for C language. For C++ it seems to be -fno-elide-constructors.

The source code implementing it is here but since it is a middle-end, no front end information is spoiled even in comments. The manual section did not exactly help either. However disassembling shows that as it is turned off on O0 and enabled on O1 it must be one of the following:

      -fauto-inc-dec 
      -fcprop-registers 
      -fdce 
      -fdefer-pop 
      -fdelayed-branch 
      -fdse 
      -fguess-branch-probability 
      -fif-conversion2 
      -fif-conversion 
      -finline-small-functions 
      -fipa-pure-const 
      -fipa-reference 
      -fmerge-constants
      -fsplit-wide-types 
      -ftree-builtin-call-dce 
      -ftree-ccp 
      -ftree-ch 
      -ftree-copyrename 
      -ftree-dce 
      -ftree-dominator-opts 
      -ftree-dse 
      -ftree-fre 
      -ftree-sra 
      -ftree-ter 
      -funit-at-a-time

C code:

struct p {
    long x;
    long y;
    long z;
};

__attribute__((noinline))
struct p f(void) {
    struct p copy;
    copy.x = 1; 
    copy.y = 2;
    copy.z = 3;
    return copy;
}

int main(int argc, char** argv) {
    volatile struct p inst = f();
    return 0;
}

Compiled with O0 we see that the 'copy' structure is naively allocated on stack:

00000000004004b6 <f>:
  4004b6:   55                      push   rbp
  4004b7:   48 89 e5                mov    rbp,rsp
  4004ba:   48 89 7d d8             mov    QWORD PTR [rbp-0x28],rdi
  4004be:   48 c7 45 e0 01 00 00    mov    QWORD PTR [rbp-0x20],0x1
  4004c5:   00 
  4004c6:   48 c7 45 e8 02 00 00    mov    QWORD PTR [rbp-0x18],0x2
  4004cd:   00 
  4004ce:   48 c7 45 f0 03 00 00    mov    QWORD PTR [rbp-0x10],0x3
  4004d5:   00 
  4004d6:   48 8b 45 d8             mov    rax,QWORD PTR [rbp-0x28]
  4004da:   48 8b 55 e0             mov    rdx,QWORD PTR [rbp-0x20]
  4004de:   48 89 10                mov    QWORD PTR [rax],rdx
  4004e1:   48 8b 55 e8             mov    rdx,QWORD PTR [rbp-0x18]
  4004e5:   48 89 50 08             mov    QWORD PTR [rax+0x8],rdx
  4004e9:   48 8b 55 f0             mov    rdx,QWORD PTR [rbp-0x10]
  4004ed:   48 89 50 10             mov    QWORD PTR [rax+0x10],rdx
  4004f1:   48 8b 45 d8             mov    rax,QWORD PTR [rbp-0x28]
  4004f5:   5d                      pop    rbp
  4004f6:   c3                      ret    

Compiled with O1 it is not allocated but a pointer is passed as an implicit argument

00000000004004b6 <f>:
  4004b6:   48 89 f8                mov    rax,rdi
  4004b9:   48 c7 07 01 00 00 00    mov    QWORD PTR [rdi],0x1
  4004c0:   48 c7 47 08 02 00 00    mov    QWORD PTR [rdi+0x8],0x2
  4004c7:   00 
  4004c8:   48 c7 47 10 03 00 00    mov    QWORD PTR [rdi+0x10],0x3
  4004cf:   00 
  4004d0:   c3                      ret 
1

There are 1 best solutions below

0
On

The closest thing to that in GCC (i.e. a switch for copy elision) is -fcprop-registers. Copy elision doesn't exist in C, but this is the most similar feature to that. From the man page:

After register allocation and post-register allocation instruction splitting, we perform a copy-propagation pass to try to reduce scheduling dependencies and occasionally eliminate the copy. Enabled at levels -O, -O2, -O3, -Os.