Is clang-tidy modernize-pass-by-value only right from C++20 on?

325 Views Asked by At

There are many questions and answers about whether to pass arguments by value or by reference. This answer https://stackoverflow.com/a/51706522/2492801 seems to indicate that passing by value and moving is not optimal for lvalues. It seems as if this benchmark for C++17 proves it: https://quick-bench.com/q/8pCCiw5tBtGwh8XaR9EsBTLZTzw. The case "BM_PassLvalueByValueAndMove" is the slowest of all. The cases where the lvalue is passed by reference ("BM_PassLvalueByReferenceAndCopy" and "BM_PassLValueToStructOfferingBoth") are faster.

But the clang-tidy check "modernize-pass-by-value" unconditionally proposes to pass by value in constructors: https://clang.llvm.org/extra/clang-tidy/checks/modernize/pass-by-value.html. Isn't this a bad advice? It may be beneficial for rvalues, but has downsides for lvalues.

But actually this also seems to depend on the language standard. When I change "C++17" to "C++20" in the benchmark (see https://quick-bench.com/q/U9di84h6HaNQw7urp6qEiTjYPMI), suddenly the case "BM_PassLvalueByValueAndMove" is not slower than the other lvalue cases anymore. This remains true for "C++23".

Does that mean that clang-tidy is right with its proposal, but only from C++20 on? What change in the standard has made passing by value and moving better also for lvalues?

Benchmark code:

#include <benchmark/benchmark.h>

#include <string>
#include <utility>

struct PassByValueAndMove
{
    explicit PassByValueAndMove(std::string string)
        : m_string(std::move(string))
    {
    }

    std::string m_string;
};

struct PassByReferenceAndCopy
{
    explicit PassByReferenceAndCopy(const std::string& string)
        : m_string(string)
    {
    }

    std::string m_string;
};

struct PassBoth
{
    explicit PassBoth(std::string&& string)
        : m_string(std::move(string))
    {
    }

    explicit PassBoth(const std::string& string)
        : m_string(string)
    {
    }

    std::string m_string;
};

static void BM_PassRvalueByReferenceAndCopy(benchmark::State& state)
{
    for (auto _ : state)
    {
        static_cast<void>(_);

        {
            PassByReferenceAndCopy instance("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
            benchmark::DoNotOptimize(instance);
        }
    }
}

BENCHMARK(BM_PassRvalueByReferenceAndCopy);

static void BM_PassRvalueByValueAndMove(benchmark::State& state)
{
    for (auto _ : state)
    {
        static_cast<void>(_);

        {
            PassByValueAndMove instance("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
            benchmark::DoNotOptimize(instance);
        }
    }
}

BENCHMARK(BM_PassRvalueByValueAndMove);

static void BM_PassLvalueByReferenceAndCopy(benchmark::State& state)
{
    for (auto _ : state)
    {
        static_cast<void>(_);

        {
            const std::string      dummy_string = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
            PassByReferenceAndCopy instance(dummy_string);
            benchmark::DoNotOptimize(instance);
        }
    }
}

BENCHMARK(BM_PassLvalueByReferenceAndCopy);

static void BM_PassLvalueByValueAndMove(benchmark::State& state)
{
    for (auto _ : state)
    {
        static_cast<void>(_);

        {
            const std::string  dummy_string = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
            PassByValueAndMove instance(dummy_string);
            benchmark::DoNotOptimize(instance);
        }
    }
}

BENCHMARK(BM_PassLvalueByValueAndMove);

static void BM_PassRvalueToStructOfferingBoth(benchmark::State& state)
{
    for (auto _ : state)
    {
        static_cast<void>(_);

        {
            PassBoth instance("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
            benchmark::DoNotOptimize(instance);
        }
    }
}

BENCHMARK(BM_PassRvalueToStructOfferingBoth);

static void BM_PassLValueToStructOfferingBoth(benchmark::State& state)
{
    for (auto _ : state)
    {
        static_cast<void>(_);

        {
            const std::string dummy_string = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
            PassBoth          instance(dummy_string);
            benchmark::DoNotOptimize(instance);
        }
    }
}

BENCHMARK(BM_PassLValueToStructOfferingBoth);
4

There are 4 best solutions below

0
On BEST ANSWER

I don't want to simply state "I can't reproduce your issue", but are you certain your measurements are correct and not caused by some quickbench's hiccup?

I have re-run your code multiple times and I keep getting the following results, which seem intuitive:

enter image description here

C++20 seems generally faster, but the proportions remain the same. I can try locally with gprof or perf, maybe that will shed some more light on the problem.

In of one of the comments you've mentioned:

The static code analysis tool Helix QACPP proposes the opposite (use const references), if the function parameter exceeds a certain maximum size

but unfortunately the link is down.

Limiting the size makes sense though: considering some large object, moving it down the call stack might involve lots of move operations on each individual member, while passing a reference can possibly be optimized to passing a single pointer (to the object itself) and a single move/copy of its multiple members when the actual construction/assignment happens.

Is that better/worse? Performance-wise: I don't know, measure for your case. It definitely has chances of being more efficient, but keep in mind that compilers are good at optimizing. EDIT: I did quickly measure it and it turns out that there are no significant differences. This might be my trivial to optimize example though. Again, measure, measure, measure.

#include <map>
#include <utility>
#include <array>


struct S
{
    std::array<double, 20000> a;
};

struct ReallyLarge
{
    S s1;
    std::map<char, int> m1{};
    std::map<char, long double> m2{};
    std::map<char, int> m3{};
    std::multimap<char, int> m4{};
    std::multimap<int, double> m5{};
    std::map<char, int> m6{};
    std::map<char, long double> m7{};
    std::map<char, int> m8{};
    std::multimap<char, int> m9{};
    std::multimap<int, double> m10{};
    std::map<char, int> m11{};
    std::map<char, long double> m12{};
    std::map<char, int> m13{};
    std::multimap<char, int> m14{};
    std::multimap<int, double> m15{};
    std::map<char, int> m16{};
    std::map<char, long double> m17{};
    std::map<char, int> m18{};
    std::multimap<char, int> m19{};
    std::multimap<int, double> m20{};
    S s2;
};


void f3v(ReallyLarge mm)
{
    auto x = std::move(mm);
}

void f2v(ReallyLarge mm)
{
    f3v(std::move(mm));
}

void f1v(ReallyLarge mm)
{
    f2v(std::move(mm));
}

void f0v(ReallyLarge mm)
{
    f1v(std::move(mm));
}


void f3r(ReallyLarge&& mm)
{
    auto x = std::move(mm);
}

void f2r(ReallyLarge&& mm)
{
    f3v(std::move(mm));
}

void f1r(ReallyLarge&& mm)
{
    f2v(std::move(mm));
}

void f0r(ReallyLarge&& mm)
{
    f1v(std::move(mm));
}



static void ByMoveVal(benchmark::State& state) {
  for (auto _ : state) {
    f0v(ReallyLarge{});
  }
}
BENCHMARK(ByMoveVal);

static void ByMoveRef(benchmark::State& state) {
  for (auto _ : state) {
    f0r(ReallyLarge{});
  }
}
BENCHMARK(ByMoveRef);

https://quick-bench.com/q/Mje6qd2wWwW0jFtjfxOd7ajKvO4 enter image description here

If performance is not critical, the main significant difference between the two is exception safety. For references, it's possible to catch the exception inside the caller while the original passed parameter remains intact. If moving into a value and the called function throws, the original object is lost on caller's end.

EDITED: Follow-up after re-running the original benchmark many times: it's highly inconsistent, also on my local machine. I have no clue at this stage what it's related to, I'd probably guess memory allocation, but that's just guess.

However, I have also performed one more test: with the string to copy from moved out from the measuring loop. I thought this would be more fair, as it eliminates one extra string construction from the equation, and the aim of the measurement is to learn the speed of move/copy construction. On my Apple M2 I've gotten this for both C++17 and C++20 with O3:

----------------------------------------------------------------------------
Benchmark                                  Time             CPU   Iterations
----------------------------------------------------------------------------
BM_PassRvalueByReferenceAndCopy         49.4 ns         49.4 ns    850220329
BM_PassRvalueByValueAndMove             21.8 ns         21.8 ns   1000000000
BM_PassLvalueByReferenceAndCopy         28.5 ns         28.5 ns   1000000000
BM_PassLvalueByValueAndMove             28.1 ns         28.1 ns   1000000000
BM_PassRvalueToStructOfferingBoth       21.8 ns         21.8 ns   1000000000
BM_PassLValueToStructOfferingBoth       28.4 ns         28.4 ns   1000000000

which is somewhat similar to what QuickBench gives me pretty consistently (Clang and libc++ to reflect setup I have at home): https://quick-bench.com/q/wgOUAodUb-iuS6_CnYgXt_rHbBI enter image description here


#include <benchmark/benchmark.h>

#include <string>
#include <utility>

struct PassByValueAndMove
{
    explicit PassByValueAndMove(std::string string)
        : m_string(std::move(string))
    {
    }

    std::string m_string;
};

struct PassByReferenceAndCopy
{
    explicit PassByReferenceAndCopy(const std::string& string)
        : m_string(string)
    {
    }

    std::string m_string;
};

struct PassBoth
{
    explicit PassBoth(std::string&& string)
        : m_string(std::move(string))
    {
    }

    explicit PassBoth(const std::string& string)
        : m_string(string)
    {
    }

    std::string m_string;
};

static void BM_PassRvalueByReferenceAndCopy(benchmark::State& state)
{
    for (auto _ : state)
    {
        static_cast<void>(_);

        {
            PassByReferenceAndCopy instance("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
            benchmark::DoNotOptimize(instance);
        }
    }
}

BENCHMARK(BM_PassRvalueByReferenceAndCopy);

static void BM_PassRvalueByValueAndMove(benchmark::State& state)
{
    for (auto _ : state)
    {
        static_cast<void>(_);

        {
            PassByValueAndMove instance("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
            benchmark::DoNotOptimize(instance);
        }
    }
}

BENCHMARK(BM_PassRvalueByValueAndMove);

static void BM_PassLvalueByReferenceAndCopy(benchmark::State& state)
{
    const std::string      dummy_string = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";

    for (auto _ : state)
    {
        static_cast<void>(_);

        {
            PassByReferenceAndCopy instance(dummy_string);
            benchmark::DoNotOptimize(instance);
        }
    }
}

BENCHMARK(BM_PassLvalueByReferenceAndCopy);

static void BM_PassLvalueByValueAndMove(benchmark::State& state)
{
    const std::string  dummy_string = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";

    for (auto _ : state)
    {
        static_cast<void>(_);

        {
            PassByValueAndMove instance(dummy_string);
            benchmark::DoNotOptimize(instance);
        }
    }
}

BENCHMARK(BM_PassLvalueByValueAndMove);

static void BM_PassRvalueToStructOfferingBoth(benchmark::State& state)
{
    for (auto _ : state)
    {
        static_cast<void>(_);

        {
            PassBoth instance("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
            benchmark::DoNotOptimize(instance);
        }
    }
}

BENCHMARK(BM_PassRvalueToStructOfferingBoth);

static void BM_PassLValueToStructOfferingBoth(benchmark::State& state)
{
    const std::string dummy_string = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";

    for (auto _ : state)
    {
        static_cast<void>(_);

        {
            PassBoth          instance(dummy_string);
            benchmark::DoNotOptimize(instance);
        }
    }
}

BENCHMARK(BM_PassLValueToStructOfferingBoth);

2
On

The clang-tidy check only fires if you have a single constructor that takes a const reference and not if you also have a constructor that takes by rvalue reference (See: https://godbolt.org/z/4zWxoWoT4)

I think you can agree that the suggested fixit is a performance and readability win over the previous:

struct S {
    std::string member;

    explicit S(const std::string& s) : member(s) {}
    // clang-tidy suggests
    // explicit S(std::string s) : member(std::move(s)) {}
};

And taking by value and moving only results in 1 extra move for a glvalue (and no extra moves for a prvalue).

Worrying about that may be a premature optimization since a single move should be incredibly quick. It's not even a good premature optimization, since if that really was a performance bottleneck, you should switch to taking a lambda and constructing the value in-place to get rid of the moves and copies altogether.

0
On

Yes, this is a bad advice, but not for the reasons you explore here. In the majority of C++ codebases the rule regarding by-value vs by-const-reference is 1) if the argument has a good chance to fit into a register, pass always by value, and 2) pass by const reference otherwise.

A linter is not an instrument to improve code runtime really. This particular check is more of a team preference enforcer. It's for the case when the codebase is modern enough to make sure correct move constructors are available and the developers are savvy enough to understand how std::move works.

As your benchmark shows in the case of Clang, the compiler may optimize the code well even without fancy hard-to-judge-about C++ features.

I'd advise against this check. As you can see in the linter documentation, there are many limitations attached to it. std::move was always a memory management construct. As a software developer, instead of thinking hard about memory management I'd rather think about the problem domain and the task at hand.

As a side note, linters are known to provide controversial checks. They aim to appeal to different tastes so that you can always enforce the type of code style you like. These checks are not truth in the last instance.

0
On

I have measured it locally and pass by value and move is slightly faster. This is the comparison for msvc and clang: enter image description here

enter image description here

enter image description here

enter image description here

So as you can see performance is not a problem. The reason for this warning described in the link, that you have shared:

With move semantics added to the language and the standard library updated with move constructors added for many types it is now interesting to take an argument directly by value, instead of by const-reference, and then copy. This check allows the compiler to take care of choosing the best way to construct the copy.

By allowing compiler to do this for you, you can expect, that it will generate more efficient code with using of move constructors.