Which is better for program speed after compiler-optimization: return-by-value, or return-by-reference to a persistent object?
/// Generate a 'foo' value directly as a return type.
template< typename T >
inline T gen_foo();
/// Get a 'foo' reference of a persistent object.
template< typename T >
inline T const& get_foo();
T
will be primitives, pointers, member-pointers, or user-defined small, P.O,D.-like data.
To the best of my knowledge it is pass-by-value, but there is a possible case for pass-by-reference:
pass-by-value:
- returning one
T
is a smaller object and fast to copy into a caller's variable. - optimizer can use (N)RVO and copy-elision to remove return copies.
- optimizer can inline the generating code or the generated value into the caller's code.
- program will not need to access RAM, cached or not.
- returning one
pass-by-reference:
- optimizer might evaluate the persistent value fully, and replace its use with a literal equivalent. Whether or not this occurs affects the rest of the analysis.
- if the persistent value is fully-evaluated and substituted as a literal:
- no value to return.
- optimizer can inline the literal easily.
- program won't need to access RAM, cached or not.
- if the persistent value can't be fully evaluated and substituted:
- returning one reference is a small object and fast into copy to a caller's variable.
- optimizer can use (N)RVO and copy-elision to avoid return copies.
- optimizer can't inline the generating code or the generated value into the caller's code.
- program would need to access RAM, although this likely would be in L1/L2/etc. cache.
Background:
I'm being forced to consider this because on some platforms, some floating-point exceptions get triggered if I return-by-value, but are not if I fill-by-parameter-reference. ( This is a given; this question is not to debate this point. ) So, the API I wanted, and the API I'm forced to consider using are:
/// Generate a 'foo' value directly as a return type.
template< typename T >
inline T gen_foo();
/// Fill in a 'foo' passed in by reference.
template< typename T >
inline void fill_foo( T& r_foo );
Since, I abhor the 'fill' API, ( because it separates definition from initialization, prevents creating temporaries, etc., ) I can transform that into a return-by-reference version instead, something like:
/// Forward-declare 'Initialized_Foo'.
template< typename T > struct Initialized_Foo;
/// Get a 'foo' reference; this returns a persistent reference to a static object.
template< typename T >
inline T const& get_foo()
{
#if 0
// BAD: This calls 'fill_foo' *every* time, and breaks const-correctness.
thread_local static const T foo;
fill_foo( const_cast< T& >( foo ) );
return foo;
#else
// GOOD: This calls 'fill_foo' only *once*, and honours const-correctness.
thread_local static const Initialized_Foo< T > initialized_foo;
return initialized_foo.data;
#endif
}
/// A 'foo' initializer to call 'fill_foo' at construction time.
template< typename T >
struct Initialized_Foo
{
T data;
Initialized_Foo()
{
fill_foo( data );
}
};