I'm inspecting copy-elision between trivial and non-trivial copy-able types when one function's return by value directly passes by value into another function. For the non-trivial case, it appears the object is directly transferred as expected, but for the trivial case, it appears the output object is copied on the stack to make the input object for the second function. My question is, why?
If this is expected, this is surprising, as the non-trivially copy-able type is more efficiently passed between these functions.
Source:
struct Trivial_Struct
{
unsigned char bytes[ 4 * sizeof( void* ) ];
};
struct Nontrivial_Struct
{
unsigned char bytes[ 4 * sizeof( void* ) ];
Nontrivial_Struct( Nontrivial_Struct const& );
};
Trivial_Struct trivial_struct_source();
Nontrivial_Struct nontrivial_struct_source();
void trivial_struct_sink( Trivial_Struct );
void nontrivial_struct_sink( Nontrivial_Struct );
void test_trivial_struct()
{
trivial_struct_sink( trivial_struct_source() );
}
void test_nontrivial_struct()
{
nontrivial_struct_sink( nontrivial_struct_source() );
}
GCC Output Assembly:
test_trivial_struct():
sub rsp, 40
mov rdi, rsp
call trivial_struct_source()
push QWORD PTR [rsp+24]
push QWORD PTR [rsp+24]
push QWORD PTR [rsp+24]
push QWORD PTR [rsp+24]
call trivial_struct_sink(Trivial_Struct)
add rsp, 72
ret
test_nontrivial_struct():
sub rsp, 40
mov rdi, rsp
call nontrivial_struct_source()
mov rdi, rsp
call nontrivial_struct_sink(Nontrivial_Struct)
add rsp, 40
ret
godbolt.org. I tried GCC, Clang, and MSVC; GCC's assembly is easier for me to read, but all compilers seems to make similar code for the trivially copy-ably case.
Misc:
- Apparently, I can accidentally make 'Nontrivial_Struct' actually be trivial if I declare the copy constructor inside the class definition as
Nontrivial_Struct( Nontrivial_Struct const& ) = default; if I addNontrivial_Struct::Nontrivial_Struct( Nontrivial_Struct const& ) = default;after the class definition then it remains non-trivial. - I can change the '4' to large values, such as '64', and it still occurs.
Speculation:
- Is this a backwards compatibility thing with the C ABI?
- Does it have anything to do with http://eel.is/c++draft/class.temporary#3?
The calling convention is mandated by the ABI. The ABI specifies that both the source functions' return values are allocated by the caller and a hidden pointer is passed. The ABI specifies that the trivial struct is passed on the stack and the nontrivial one is passed by hidden pointer. Reference: x86-64 and C++ ABIs.
[class.temporary]/3 gives implementations latitude to create temporaries for arguments and return values, which makes the observed behavior OK. It does not mandate it.
The trivial struct is the return value which is initialized in the stack and must be passed on the stack (both because of ABI). One might ask, why does it copy the struct from its first location on the stack to the second location on the stack? That copy is indeed unnecessary. The compiler could do better. Here's the GCC bug.