Are std::views sub-optimal in GCC even in simple cases

88 Views Asked by Timo At 25 February 2024 at 12:11

I'm working on embedded code, and I was at first delighted to find out that I could use std::ranges and views to simplify even performance intensive loops, as the compiler optimizes out all the iterators down to the same assembler as if I'd written the old school loop where all the indexing is done by hand.

Now, C++23 introduces views::adjacent, views::stride, etc, which would allow me to simplify even more. However, it appears that the optimizer hits a wall there. A simplified toy-model, sum every second element of an array:

// Old-school
std::tuple<int, int> process(const std::array<int, 16> &in)
{
    int sumL = 0;
    int sumR = 0;
    for (unsigned i = 0; i < in.size(); )
    {
        sumL += in[i++];
        sumR += in[i++];
    }
    return {sumL, sumR};
}

//Ranges
std::tuple<int, int> processRanges(const std::array<int, 16> &in)
{
    int sumL = 0;
    int sumR = 0;
    for (auto && [l, r] : in | std::views::adjacent<2> | std::views::stride(2))
    {
        sumL += l;
        sumR += r;
    }
    return {sumL, sumR};
}
// Ranges, using std::views::chunk
std::tuple<int, int> processRangesChunked(const std::array<int, 16> &in)
{
    int sumL = 0;
    int sumR = 0;
    for (auto && inner: in | std::views::chunk(2))
    {
        sumL += inner[0];
        sumR += inner[1];
    }
    return {sumL, sumR};
}

Using -O3, the old-school version compiles to assembly that I couldn't improve on by hand, the loop is entirely unrolled, etc. The ranges version using adjacent and stride not only misses the unrolling, but does a weird nested-loop-looking -code. Using chunk is a bit better, but still produces more instructions and has a slightly less nice interface anyway. Godbolt: https://godbolt.org/z/r99seWEMz

While in this case, it's a micro-optimization, in my actual use case which has a similar structure of processing every-second-element differently, the compiler misses obvious and very necessary inlinings etc, completely destroying the performance.

My question(s): is it just a fact of life at the moment that more complicated loop indexing cannot be using std::ranges where performance matters? Or maybe I'm writing an unnecessarily complicated view with adjacent and stride, and there's some way that optimizes better? Perhaps by writing a custom view, like chunked but returning tuples?

Original Q&A

Are std::views sub-optimal in GCC even in simple cases

There are 0 best solutions below

Related Questions in C++

Related Questions in OPTIMIZATION

Related Questions in COMPILER-OPTIMIZATION

Related Questions in STD-RANGES

Trending Questions

Popular # Hahtags

Popular Questions