Currently reading the codebase for cpr requests library: https://github.com/whoshuu/cpr/blob/master/include/cpr/api.h
Noticed that the interface for this library uses perfect forwarding quite often. Just learning rvalue references so this is all relatively new to me.
From my understanding, the benefit with rvalue references, templating, and forwarding is that the function call being wrapped around will take its arguments by rvalue reference rather than by value. Which avoids unnecessary copying. It also prevents one from having to generate a bunch of overloads due to reference deduction.
However, from my understanding, const lvalue reference essentially does the same thing. It prevents the need for overloads and passes everything by reference. With the caveat that if the function being wrapped around takes a non-const reference, it won't compile.
However if everything within the call stack won't need a non-const reference, then why not just pass everything by const lvalue reference?
I guess my main question here is, when should you use one over the other for best performance? Attempted to test this with the below code. Got the following relatively consistent results:
Compiler: gcc 6.3 OS: Debian GNU/Linux 9
<<<<
Passing rvalue!
const l value: 2912214
rvalue forwarding: 2082953
Passing lvalue!
const l value: 1219173
rvalue forwarding: 1585913
>>>>
These results stay fairly consistent between runs. It appears that for an rvalue arg, the const l value signature is slightly slower, though I'm not exactly sure why, unless I'm misunderstanding this and const lvalue reference does in fact make a copy of the rvalue.
For lvalue arg, we see the counter, rvalue forwarding is slower. Why would this be? Shouldn't the reference deduction always produce a reference to an lvalue? If thats the case shouldn't it be more or less equivalent to the const lvalue reference in terms of performance?
#include <iostream>
#include <string>
#include <utility>
#include <time.h>
std::string func1(const std::string& arg) {
std::string test(arg);
return test;
}
template <typename T>
std::string func2(T&& arg) {
std::string test(std::forward<T>(arg));
return test;
}
void wrap1(const std::string& arg) {
func1(arg);
}
template <typename T>
void wrap2(T&& arg) {
func2(std::forward<T>(arg));
}
int main()
{
auto n = 100000000;
/// Passing rvalue
std::cout << "Passing rvalue!" << std::endl;
// Test const l value
auto t = clock();
for (int i = 0; i < n; ++i)
wrap1("test");
std::cout << "const l value: " << clock() - t << std::endl;
// Test rvalue forwarding
t = clock();
for (int i = 0; i < n; ++i)
wrap2("test");
std::cout << "rvalue forwarding: " << clock() - t << std::endl;
std::cout << "Passing lvalue!" << std::endl;
/// Passing lvalue
std::string arg = "test";
// Test const l value
t = clock();
for (int i = 0; i < n; ++i)
wrap1(arg);
std::cout << "const l value: " << clock() - t << std::endl;
// Test rvalue forwarding
t = clock();
for (int i = 0; i < n; ++i)
wrap2(arg);
std::cout << "rvalue forwarding: " << clock() - t << std::endl;
}
First of all, here are slightly different results from your code. As mentioned in comments, compiler and its settings are very important. In particular, you may notice that all cases have similar runtime, except for the first one, which is about twice as slow.
Let's look at exactly what happens in each case.
1) When calling
wrap1("test")
, since signature of that function expects aconst std::string &
, the char array you are passing will be implicitly converted to a temporarystd::string
object on every call (i.e.n
times), which involves a copy* of the value. A const reference to that temporary will then be passed intofunc1
, where anotherstd::string
is constructed from it, which again involves a copy (since it's a const reference, it cannot be moved from, despite being in fact a temporary). Even though the function returns by value, due to RVO that copy would be guaranteed to be elided if the return value was used. In this case the return value is not used, and I'm not entirely sure whether the standard allows the compiler to optimize away the construction oftemp
. I suspect not, since in general such construction could have observable side effects (and your results suggest it does not get optimized away). To sum up, a full-on construction and destruction ofstd::string
is performed twice in this case.2) When calling
wrap2("test")
, the argument type isconst char[5]
, and it gets forwarded as an rvalue reference all the way tofunc2
, where anstd::string
constructor from aconst char[]
is called that copies the value. The deduced type of template parameterT
isconst char[5] &&
and, quite obviously, it cannot be moved from despite being an rvalue reference (due to both beingconst
and not being anstd::string
). Compared to the previous case, construction/destruction of a string only happens once per call (theconst char[5]
literal is always in memory and incurs no overhead).3) When calling
wrap1(arg)
, you are passing an lvalue as aconst string &
through the chain, and one copy constructor is called infunc1
.4) When calling
wrap2(arg)
, this is similar to the previous case, since the deduced type forT
isconst std::string &
.5) I'm assuming your test was designed to demonstrate the advantage of perfect forwarding when a copy of the argument needs to be made at the bottom of the call chain (hence the creation of
temp
). In this case, you need to replace the"test"
argument in first two cases withstd::string("test")
in order to truly have anstd::string &&
argument, and also fix your perfect forwarding to bestd::forward<T>(arg)
, as mentioned in comments. In that case, the results are:which is similar to what we had before, but now actually invoking a move constructor.
I hope this helps explain the results. There may be some other issues related to inlining of function calls and other compiler optimizations, which would help explain the smaller discrepancies between cases 2-4.
As to your question which approach to use, I suggest reading Scott Meyer's "Effective Modern C++" items 23-30. Apologies for a book reference instead of a direct answer, but there is no silver bullet, and the optimal choice is always case-dependent, so it's better to just understand the trade-offs of each design decision.
* A copy constructor may or may not involve dynamic memory allocation due to Short String Optimization; thanks to ytoledano for bringing this up in the comments. Also, I've implicitly assumed throughout the answer that a copy is significantly more expensive that a move, which is not always the case.