What optimizations are enabled by non-type template parameters?

224 Views Asked by At

I found this example at cppreference.com, and it seems to be the defacto example used through-out StackOverflow:

template<int N>
struct S {
    int a[N];
};

Surely, non-type templatization has more value than this example. What other optimizations does this syntax enable? Why was it created?

I am curious, because I have code that is dependent on the version of a separate library that is installed. I am working in an embedded environment, so optimization is important, but I would like to have readable code as well. That being said, I would like to use this style of templating to handle version differences (examples below). First, am I thinking of this correctly, and MOST IMPORTANTLY does it provide a benefit or drawback over using a #ifdef statement?

Attempt 1:

template<int VERSION = 500>
void print (char *s);

template<int VERSION>
void print (char *s) {
    std::cout << "ERROR! Unsupported version: " << VERSION << "!" << std::endl;
}

template<>
void print<500> (char *s) {
    // print using 500 syntax
}

template<>
void print<600> (char *s) {
    // print using 600 syntax
}

OR - Since the template is constant at compile time, could a compiler consider the other branches of the if statement dead code using syntax similar to:

Attempt 2:

template<int VERSION = 500>
void print (char *s) {
   if (VERSION == 500) {
       // print using 500 syntax
   } else if (VERSION == 600) {
       // print using 600 syntax
   } else {
       std::cout << "ERROR! Unsupported version: " << VERSION << "!" << std::endl;
   }
}

Would either attempt produce output comparable in size to this?

void print (char *s) {
#if defined(500)
    // print using 500 syntax
#elif defined(600)
    // print using 600 syntax
#else
    std::cout << "ERROR! Unsupported version: " << VERSION << "!" << std::endl;
#endif
}

If you can't tell I'm somewhat mystified by all this, and the deeper the explanation the better as far as I'm concerned.

3

There are 3 best solutions below

2
On BEST ANSWER

An obvious optimization is when using an integer, the compiler has a constant rather than a variable:

int foo(size_t); // definition not visible
// vs
template<size_t N>
size_t foo() {return N*N;}

With the template, there's nothing to compute at runtime, and the result may be used as a constant, which can aid other optimizations. You can take this example further by declaring it constexpr, as 5gon12eder mentioned below.

Next example:

int foo(double, size_t); // definition not visible
// vs
template<size_t N>
size_t foo(double p) {
 double r(p);
 for (size_t i(0) i < N; ++i) {
  r *= p;
 }
 return r;
}

Ok. Now the number of iterations of the loop is known. The loop may be unrolled/optimized accordingly, which can be good for size, speed, and eliminating branches.

Also, basing off your example, std::array<> exists. std::array<> can be much better than std::vector<> in some contexts, because std::vector<> uses heap allocations and non-local memory.

There's also the possibility that some specializations will have different implementations. You can separate those and (potentially) reduce other referenced definitions.

Of course, templates<> can also work against you unnecessarily duplication of your programs.

templates<> also require longer symbol names.

Getting back to your version example: Yes, it's certainly possible that if VERSION is known at compilation, the code which is never executed can be deleted and you may also be able to reduce referenced functions. The primary difference will be that void print (char *s) will have a shorter name than the template (whose symbol name includes all template parameters). For one function, that's counting bytes. For complex programs with many functions and templates, that cost can go up quickly.

0
On

There is an enormous range of potential applications of non-typename template parameters. In his book The C++ Programming Language, Stroustrup gives an interesting example that sketches out a type-safe zero-overhead framework for dealing with physical quantities. Basically, the idea is that he writes a template that accepts integers denoting the powers of fundamental physical quantities such as length or mass and then defines arithmetic on them. In the resulting framework, you can add speed with speed or divide distance by time but you cannot add mass to time. Have a look at Boost.Units for an industry-strength implementation of this idea.

For your second question. Any reasonable compiler should be able to produce exactly the same machine code for

#define FOO

#ifdef FOO
do_foo();
#else
do_bar();
#endif

and

#define FOO_P 1

if (FOO_P)
  do_foo();
else
  do_bar();

except that the second version is much more readable and the compiler can catch errors in both branches simultaneously. Using a template is a third way to generate the same code but I doubt that it will improve readability.

1
On

Compilers find dead code elimination easy. That is the case where you have a chain of ifs depending (only) on a template parameter's value or type. All branches must contain valid code, but when compiled and optimized the dead branches evaporate.

A classic example is a per pixel operation written with template parameters that control details of code flow. The body can be full of branches, yet the compiled output branchless.

Similar techniques can be used to unroll loops (say scanline loops). Care must be taken to understand the code size multiplication that can result: especially if your compiler lacks ICF (aka comdat folding) such as the gold gcc linker and msvc (among others) have.

Fancier things can also be done, like manual jump tables.

You can do pure compile time type checks with no runtime behaviour at alll stuff like dimensional analysis. Or distinguish between points and vectors in n-space.

Enums can be used to name types or switches. Pointers to functions to enable efficient inlining. Pointers to data to allow 'global' state that is mockable, or siloable, or decoupled from implementation. Pointers to strings to allow efficient readable names in code. Lists of integral values for myriads of purposes, like the indexes trick to unpack tuples. Complex operations on static data, like compile time sorting of data in multiple indexes, or checking integrity of static data with complex invariants.

I am sure I missed some.