Given a chain of lambdas where each one captures the previous one by value:
auto l1 = [](int a, int b) { std::cout << a << ' ' << b << '\n'; };
auto l2 = [=](int a, int b) { std::cout << a << '-' << b << '\n'; l1(a, b); };
auto l3 = [=](int a, int b) { std::cout << a << '#' << b << '\n'; l2(a, b); };
auto l4 = [=](int a, int b) { std::cout << a << '%' << b << '\n'; l3(a, b); };
std::cout << sizeof(l4);
We can observe, that the resulting sizeof
of l4
is equal to 1
.
That makes sense to me. We are capturing lambdas by value and each of those objects has to have sizeof
equal to 1
, but since they are stateless, an optimization similar to [[no_unique_address]]
one applies (especially since they all have unique types).
However, when I try to create a generic builder for chaining comparators, this optimization no longer takes place:
template <typename Comparator>
auto comparing_by(Comparator&& comparator) {
return comparator;
}
template <typename Comparator, typename... Comparators>
auto comparing_by(Comparator&& comparator, Comparators&&... remaining_comparators) {
return [=](auto left, auto right) {
auto const less = comparator(left, right);
auto const greater = comparator(right, left);
if (!less && !greater) {
return comparing_by(remaining_comparators...)(left, right);
}
return less;
};
}
struct triple {
int x, y, z;
};
auto main() -> int {
auto by_x = [](triple left, triple right) { return left.x < right.x; };
auto by_y = [](triple left, triple right) { return left.y < right.y; };
auto by_z = [](triple left, triple right) { return left.z < right.z; };
auto comparator = comparing_by(by_x, by_z, by_y);
std::cout << sizeof(comparator);
}
Note 1: I am aware of the fact that comparing_by
is inefficient and sometimes calls the comparator in a redundant fashion.
Why in the above case the resulting sizeof
of comparator
is equal to 3
and not to 1
? It is still stateless, after all. Where am I wrong? Or is it just a missed optimization in all of the big three compilers?
Note 2: This is purely an academic question. I am not trying to solve any particular problem.
What's happening in the first example is not what you think it is. Let's say
l1
has typeL1
,l2
L2
, etc. These are the members of those types:And in your next example, you capture all the lambdas by value, so the closure type has three non-overlapping members, so the size will be at least 3.
[[no_unique_address]]
can't be generically applied to the data members of a closure type (consider a empty class that puts its address in a global map).The compiler could use empty base optimisation for a "well behaved type" (a trivilly-copyable empty type maybe?), so this might be a missed optimisation. The standard says this about what can be done ([expr.prim.lambda.closure]p2):
So the change in size is OK, but it would have to be done so that
is_empty_v<lambda_that_captures_stateless_lambda>
is nottrue
(since that's an observable behaviour)To "manually" apply this optimisation, you can, instead of calling the lambda
comparator(left, right)
, default construct something of the type of the closure type and call that (decltype(comparator){}(left, right)
). I've implemented that here: https://godbolt.org/z/73M1Gd3o5