overhead for moving std::shared_ptr?

196 Views Asked by At

Here is a C++ snippet. Func1 generates a shared object, which is directly moved into Func2. We think that there should not be overhead in Func3. Putting this snippet into Compiler Explorer, we see a 2-3 times shorter code with MSVC compared to clang or GCC. Why is that, and can one obtain the shorter code with clang/GCC?

It looks like Func3 generates exception handling code for cleaning up the temporary shared object.

#include <memory>

std::shared_ptr<double> Func1();
void Func2 (std::shared_ptr<double> s);

void Func3()
{
  Func2(Func1());
}
1

There are 1 best solutions below

1
On BEST ANSWER

The problem boils down to platform ABI, and is better illustrated by a completely opaque type:

struct A {
    A(const A&);
    A(A&&);
    ~A();
};

A make() noexcept;
void take(A) noexcept;

void foo() {
    take(make());
}

See comparison at Compiler Explorer

MSVC Output

void foo(void) PROC
        push    ecx
        push    ecx
        push    esp
        call    A make(void)
        add     esp, 4
        call    void take(A)
        add     esp, 8
        ret     0
void foo(void) ENDP

GCC Output (clang is very similar)

foo():
        sub     rsp, 24
        lea     rdi, [rsp+15]
        call    make()
        lea     rdi, [rsp+15]
        call    take(A)
        lea     rdi, [rsp+15]
        call    A::~A() [complete object destructor]
        add     rsp, 24
        ret

If the type has a non-trivial destructor, the caller calls that destructor after control returns to it (including when the caller throws an exception).

- Itanium C++ ABI §3.1.2.3 Non-Trivial Parameters

Explanation

What takes place here is:

  • make() yields a prvalue of type A
  • this is fed into the parameter of take(A)
    • mandatory copy elision takes place, so there is no call to copy/move constructors
  • only GCC and clang destroy A at the call site

MSVC instead destroys the temporary A (or in your case, std::shared_ptr) inside the callee, not at the call site. The extra code you're seeing is an inlined version of the std::shared_ptr destructor.

In the end, you shouldn't see any major performance impact as a result. However, if Func2 resets/releases the shared pointer, then most of the destructor code at the call site is dead, unfortunately. This ABI problem is similar to an issue with std::unique_ptr:

There is also a language issue surrounding the order of destruction of function parameters and the execution of unique_ptr's destructor. For simplicity that is being ignored in this paper, but a complete solution to "unique_ptr is as cheap to pass a T*" would have to address that as well.


See Also

Agner Fog. - Calling conventions for different C++ compilers and operating systems