Why haven't modern compilers removed the need for expression templates?

449 Views Asked by At

The standard pitch for expression templates in C++ is that they increase efficiency by removing unnecessary temporary objects. Why can't C++ compilers already remove these unnecessary temporary objects?


This is a question that I think I already know the answer to but I want to confirm since I couldn't find a low-level answer online.

Expression templates essentially allow/force an extreme degree of inlining. However, even with inlining, compilers cannot optimize out calls to operator new and operator delete because they treat those calls as opaque since those calls can be overridden in other translation units. Expression templates completely remove those calls for intermediate objects.

These superfluous calls to operator new and operator delete can be seen in a simple example where we only copy:

#include <array>
#include <vector>

std::vector<int> foo(std::vector<int> x)
{
    std::vector<int> y{x};
    std::vector<int> z{y};
    return z;
}

std::array<int, 3> bar(std::array<int, 3> x)
{
    std::array<int, 3> y{x};
    std::array<int, 3> z{y};
    return z;
}

In the generated code, we see that foo() compiles to a relatively lengthy function with two calls to operator new and one call to operator delete while bar() compiles to only a transfer of registers and doesn't do any unnecessary copying.

Is this analysis correct?

Could any C++ compiler legally elide the copies in foo()?

2

There are 2 best solutions below

0
On BEST ANSWER

However, even with inlining, compilers cannot optimize out calls to operator new and operator delete because they treat those calls as opaque since those calls can be overridden in other translation units.

since c++14, this is no more true, allocation calls can be optimized-out/reused under certain conditions:

[expr.new#10] An implementation is allowed to omit a call to a replaceable global allocation function. When it does so, the storage is instead provided by the implementation or provided by extending the allocation of another new-expression.[conditions follows]

So foo() may be legally optimized to something equivalent to bar() nowadays ...


Expression templates essentially allow/force an extreme degree of inlining

IMO the point of expression templates is not much about inlining per se, it's rather exploiting the symmetries of the type system of the domain specific language the expression models.

For example, when you multiply three, say, hermitian matrices, an expression template can use a space-time optimized algorithm exploiting the fact that the product is associative and that hermitian matrices are adjoint-symmetric, resulting in a reduction of total operation count (and possibly even better accuracy). And all this, occurs at compile time.

Conversely, a compiler cannot know what an hermitian matrix is, it's constrained evaluating the expression the brute way (according to your implementation floating point semantics).

0
On

There are two kinds of expression templates.

One kind is about domain specific languages embedded directly in C++. Boost.Spirit turns expressions into recursive descent parsers. Boost.Xpressive turns them into regular expressions. Good old Boost.Lambda turns them into function objects with argument placeholders.

Obviously there is nothing a compiler can do to get rid of that need. It would take special-purpose language extensions to add the capabilities that the eDSL adds, like lambdas were added to C++11. But it's not productive to do that for every eDSL written; it would make the language gigantic and impossible to comprehend, among other problems.

The second kind is the expression templates that keep high-level semantics the same but optimize execution. They apply domain-specific knowledge to transform expressions into more efficient execution paths, while keeping the semantics the same. A linear algebra library might do that as Massimiliano explained in his answer, or a SIMD library like Boost.Simd might translate multiple operations into a single fused operation like multiply-add.

These libraries provide services that a compiler could, in theory, perform without modifying the language specification. However, in order to do so, the compiler would have to recognize the domain in question and have all the built-in domain knowledge to do the transformations. This approach is way too complex and would make compilers huge and even slower than they are.

An alternative approach to expression templates for these kinds of libraries would be compiler plugins, i.e. instead of writing a special matrix class that has all the expression template magic, you write a plugin for the compiler that knows about the matrix type and transforms the AST that the compiler uses. The problem with this approach is that either compilers would have to agree on a plugin API (not going to happen, they work too differently internally) or the library author has to write a separate plugin for every compiler they want their library to be usable (or at least performant) with.