Here is a C++ snippet. Func1 generates a shared object, which is directly moved into Func2. We think that there should not be overhead in Func3. Putting this snippet into Compiler Explorer, we see a 2-3 times shorter code with MSVC compared to clang or GCC. Why is that, and can one obtain the shorter code with clang/GCC?
It looks like Func3 generates exception handling code for cleaning up the temporary shared object.
#include <memory>
std::shared_ptr<double> Func1();
void Func2 (std::shared_ptr<double> s);
void Func3()
{
Func2(Func1());
}
The problem boils down to platform ABI, and is better illustrated by a completely opaque type:
See comparison at Compiler Explorer
MSVC Output
GCC Output (clang is very similar)
- Itanium C++ ABI §3.1.2.3 Non-Trivial Parameters
Explanation
What takes place here is:
make()yields a prvalue of typeAtake(A)Aat the call siteMSVC instead destroys the temporary
A(or in your case,std::shared_ptr) inside the callee, not at the call site. The extra code you're seeing is an inlined version of thestd::shared_ptrdestructor.In the end, you shouldn't see any major performance impact as a result. However, if
Func2resets/releases the shared pointer, then most of the destructor code at the call site is dead, unfortunately. This ABI problem is similar to an issue withstd::unique_ptr:See Also
Agner Fog. - Calling conventions for different C++ compilers and operating systems