unrolling for loops in a special case function

959 Views Asked by At

So I'm trying to optimize some code. I have a function with a variable sized loop. However for efficiency sake I want to make cases with 1, 2 and 3 sized loops special cases that are completely unrolled. My approach so far is to declare the loop size as a const parameter then define wrapper functions that call the main function handing it a literal for the const value. I've included a code snip it illustrating the kind of thing i have in mind.

inline void someFunction (const int a)
{
    for (int i=0; i<a; i++)
    {
        // do something with i.
    }
}

void specialCase()
{
    someFunction (3);
}

void generalCase(int a)
{
    someFunction (a);
}

So my question is is it reasonable for me to expect my compiler (GCC) to unroll the for loop inside of specialCase. I'm mean obviously I can copy - paste the contents of someFunction into specialCase and replace a with 3 but I'd rather only deal with one definition of someFunction in my code for clarity sake.

4

There are 4 best solutions below

1
On

However for efficiency sake I want to make cases with 1, 2 and 3 sized loops special cases that are completely unrolled.

Have you measured that this is actually faster? I doubt it will be (or that the compiler won't unroll the loop automatically).


My approach so far is to declare the loop size as a const parameter then define wrapper functions that call the main function handing it a literal for the const value.

const doesn't mean anything here. It won't affect the compiler's ability to unroll the loop. It just means that a cannot be mutated inside the function body, but it's still a runtime argument.


If you want to ensure unrolling, then force it. It's quite easy with C++17.

template <typename F, std::size_t... Is>
void repeat_unrolled_impl(F&& f, std::index_sequence<Is...>)
{
    (f(std::integral_constant<std::size_t, Is>{}), ...);
}

template <std::size_t Iterations, typename F>
void repeat_unrolled(F&& f)
{
    repeat_unrolled_impl(std::forward<F>(f), 
                         std::make_index_sequence<Iterations>{});
}

live example on godbolt

0
On

How about this C++20 unrolling-helpers:

#pragma once
#include <utility>
#include <concepts>
#include <iterator>

template<size_t N, typename Fn>
    requires (N >= 1) && requires( Fn fn, size_t i ) { { fn( i ) } -> std::same_as<void>; }
inline
void unroll( Fn fn )
{
    auto unroll_n = [&]<size_t ... Indices>( std::index_sequence<Indices ...> )
    {
        (fn( Indices ), ...);
    };
    unroll_n( std::make_index_sequence<N>() );
}

template<size_t N, typename Fn>
    requires (N >= 1) && requires( Fn fn ) { { fn() } -> std::same_as<void>; }
inline
void unroll( Fn fn )
{
    auto unroll_n = [&]<size_t ... Indices>( std::index_sequence<Indices ...> )
    {
        return ((Indices, fn()), ...);
    };
    unroll_n( std::make_index_sequence<N>() );
}

template<size_t N, typename Fn>
    requires (N >= 1) && requires( Fn fn, size_t i ) { { fn( i ) } -> std::convertible_to<bool>; }
inline
bool unroll( Fn fn )
{
    auto unroll_n = [&]<size_t ... Indices>( std::index_sequence<Indices ...> ) -> bool
    {
        return (fn( Indices ) && ...);
    };
    return unroll_n( std::make_index_sequence<N>() );
}

template<size_t N, typename Fn>
    requires (N >= 1) && requires( Fn fn ) { { fn() } -> std::convertible_to<bool>; }
inline
bool unroll( Fn fn )
{
    auto unroll_n = [&]<size_t ... Indices>( std::index_sequence<Indices ...> ) -> bool
    {
        return ((Indices, fn()) && ...);
    };
    return unroll_n( std::make_index_sequence<N>() );
}

template<std::size_t N, typename RandomIt, typename UnaryFunction>
    requires std::random_access_iterator<RandomIt>
    && requires( UnaryFunction fn, typename std::iterator_traits<RandomIt>::value_type elem ) { { fn( elem ) }; }
inline
RandomIt unroll_for_each( RandomIt begin, RandomIt end, UnaryFunction fn )
{
    RandomIt &it = begin;
    if constexpr( N > 1 )
        for( ; it + N <= end; it += N )
            unroll<N>( [&]( size_t i ) { fn( it[i] ); } );
    for( ; it < end; ++it )
        fn( *begin );
    return it;
}

But be aware that the unrolling-factor is crucial here. Unrolling is not always beneficial and sometimes unrolling beyond the optimal CPU-specific unrolling-factor drops to the performance without unrolling.

0
On

I'd rather go this way, if you're willing to use the force-inline (non-standard) feature of all popular compilers:

__attribute__((always_inline))
void bodyOfLoop(int i) {
  // put code here
}

void specialCase() {
    bodyOfLoop(0);
    bodyOfLoop(1);
    bodyOfLoop(2);
}

void generalCase(int a) {
    for (int i=0; i<a; i++) {
        bodyOfLoop(i);
    }
}

Note: this is GCC/Clang solution. Use __forceinline for MSVC.

5
On

If you don't like templates and don't trust your compiler, there's always this method, which is inspired by the outdated method of manually unrolling loops called "duff's device":

void do_something(int i);

void do_something_n_times(int n)
{
    int i = 0;
    switch(n)
    {
        default:
            while(n > 3) {
                do_something(i++);
                --n;
            }
        case 3: do_something(i++);
        case 2: do_something(i++);
        case 1: do_something(i++);
    }
}

But I think it's worth saying that if you don't trust your compiler to do something so simple as loop unrolling for you, it's probably time to consider a new compiler.

Note that Duff's device was originally invented as a micro-optimisation strategy for programs compiled with compilers that did not automatically apply loop-unrolling optimisations.

It was invented by Tom Duff in 1983.

https://en.wikipedia.org/wiki/Duff%27s_device

Its use with modern compilers is questionable.