A pragmatic loop unrolling technique

557 Views Asked by At

I'm finding a pragmatic loop unrolling technique example.

I think Duff's device is a one of nice tip.
But Duff's device's destination is never increased. It could be useful for embeded programmer who copies data to serial device, not general programmers.

Could you give me a nice and useful example?
If you have ever used it in your real code, it will be better.

3

There are 3 best solutions below

0
On

The most pragmatic technique would be to learn and love your compiler's optimization options, and occasionally inspect the generated assembly by hand if you encounter hotspots in profiling.

0
On

(For the benefit of others, background on Duff's device can be found here and here)

I've encountered it in image processing optimizations, especially to handle border conditions where fewer pixels than a complete tile or kernel are to be copied (and this can avoid the test at each coordinate.)

1
On

I'm not sure what you mean by "destination is never increased."

Manual loop unrolling is rather uncommon. Embedded microprocessors today are fast enough that such optimization is unnecessary (and would waste valuable program memory).

I use a variation of Duff's device in a linear solver kernel. There must be one back_step for each fwd_step, and they are performed in groups of four.

Note that the forward and backward-going loops are implemented by gotos. When the if in fwd_step is skipped, execution jumps into the middle of the backward loop. So it's really a kind of double Duff's device.

This isn't any kind of "pragmatic" technique, it's just the best way I could find to express some very convoluted flow control.

switch ( entry ) {

#define fwd_step( index )                                                                                         \
                                                                                                     \
case (index):                                                                                                    \
    if ( -- count ) {                                                                                               \
        ...

startf:
    fwd_step( 0 )
    fwd_step( 1 )
    fwd_step( 2 )
    fwd_step( 3 )

        stream = stream_back;
        goto startf;


#define back_step( index )                                                            \
        .... \
    }                                                                                    \

startb:
    stream -= block_size;

    back_step( 3 )
    if ( ! -- countb ) break;
    back_step( 2 )
    if ( ! -- countb ) break;
    back_step( 1 )
    if ( ! -- countb ) break;
    back_step( 0 )
        if ( -- countb ) goto startb;
} // end switch