Here is a C++ snippet. Func1
generates a shared object, which is directly moved into Func2
. We think that there should not be overhead in Func3
. Putting this snippet into Compiler Explorer, we see a 2-3 times shorter code with MSVC compared to clang or GCC. Why is that, and can one obtain the shorter code with clang/GCC?
It looks like Func3
generates exception handling code for cleaning up the temporary shared object.
#include <memory>
std::shared_ptr<double> Func1();
void Func2 (std::shared_ptr<double> s);
void Func3()
{
Func2(Func1());
}
The problem boils down to platform ABI, and is better illustrated by a completely opaque type:
See comparison at Compiler Explorer
MSVC Output
GCC Output (clang is very similar)
- Itanium C++ ABI §3.1.2.3 Non-Trivial Parameters
Explanation
What takes place here is:
make()
yields a prvalue of typeA
take(A)
A
at the call siteMSVC instead destroys the temporary
A
(or in your case,std::shared_ptr
) inside the callee, not at the call site. The extra code you're seeing is an inlined version of thestd::shared_ptr
destructor.In the end, you shouldn't see any major performance impact as a result. However, if
Func2
resets/releases the shared pointer, then most of the destructor code at the call site is dead, unfortunately. This ABI problem is similar to an issue withstd::unique_ptr
:See Also
Agner Fog. - Calling conventions for different C++ compilers and operating systems