Transform-and-Accumulate

6k Views Asked by At

Have anybody written a C++ STL-compliant algorithm that combines std::transform and std::accumulate into a single pass algorithm supporting both the unary, binary and perhaps even (n-ary!) variant, say std::transformed_accumulate? I want this because I have found this pattern highly reusable in for example linear algebra for example in (l1-)norm calculations. The l1-norm calculates the sum of the absolute values of the elements.

5

There are 5 best solutions below

2
On

Uhm... My bet is that you can do that by embedding your transformation into the binary predicate, tranform the element and accumulate after the transformation.

struct times2accumulator {
   int operator()( int oldvalue, int newvalue ) const {
      return oldvalue + 2*newvalue;
   }
};
int r = std::accumulate( v.begin(), v.end(), 2, times2accumulator() );

That functor would be equivalent to:

struct times2 {
   int operator()( int x ) {
      return 2*x;
   }
};
std::vector<int> tmp; tmp.reserve( v.size() );
std::transform( v.begin(), v.end(), std::back_inserter(tmp), times2 );
int r = std::accumulate( tmp.begin(), tmp.end(), 0 );

Of course this could be made generic, just pass the transformation functor to a generic base functor:

template <typename Transform>
struct transform_accumulator_t {
    Transform t;
    transform_accumulator_t( Transform t ) : t(t) {}
    int operator()( int oldvalue, int newvalue ) const {
        return oldvalue + t(newvalue);
    }
};
// syntactic sugar:
template <typename T>
transform_accumulator_t<T> transform_accumulator( T t ) {
    return transform_accumulator_t<T>(t);
}
int r = std::accumulate(v.begin(), v.end(), 0, transform_accumulator(times2));

And you could also generalize on the type in the container... or even create a more generic transform_accumulator that takes both an accumulator and a transformation functors and applies them in order. Actual implementation left as an exercise for the reader.

5
On

Although it may not exactly fit the original intent, std::inner_product is basically your binary version. You pass it an initial value, two ranges, and two functors, and it applies them as:

T acc = initial_value;
while (begin1 != end1) {
    acc = binary_op1(acc, binary_op2(begin1, begin2);
    ++begin1;
    ++begin2;
return acc;

So, for your L1 you'd do something on this general order:

norm = std::inner_product(input1.begin(), input1.end(), 
                          input2.begin(), input2.end(), 
                          std::plus<int>(), std::abs);

Only that doesn't quite work -- right now, it's trying to pass std::abs where you really need a binary function that combines the two inputs, but I'm not sure how the two inputs are really supposed to be combined.

std::partial_sum is fairly close to your unary version, except that along with accumulating a result, it (attempts to) write out each intermediate result, not just the final result. To just get the final result, you'd have to write (and pass an instance of) a kind of do-nothing iterator that just holds a single value:

template<class T, class Dist=size_t, class Ptr = T*, class Ref = T&>
class unique_it : public std::iterator<std::random_access_iterator_tag, T, Dist, Ptr, Ref> { 
   T &value;
public:
   unique_it(T &v) : value(v) {}
   T &operator*() { return value; }
   unique_it &operator++() { return *this; }
   unique_it &operator+(size_t) { return *this; }
   unique_it &operator++(int) { return *this; }
};

template <class T>
unique_it<T> make_res(T &v) { return unique_it<T>(v); }

With this, your L1 normalization would look something like this:

int main(){ 
    double result=0.0;
    double inputs[] = {1, -2, 3, -4, 5, -6};

    std::partial_sum(
        inputs, inputs+6, 
        make_res(result),
        [](double acc, double v) {return acc + std::abs(v);});

    std::cout << result << "\t";
    return 0;
}
2
On

If you want to use some parallelism, I made a quick version using OpenMP :

template <class T, 
          class InputIterator, 
          class MapFunction, 
          class ReductionFunction>
T MapReduce_n(InputIterator in, 
              unsigned int size, 
              T baseval, 
              MapFunction mapper, 
              ReductionFunction reducer)
{
    T val = baseval;

    #pragma omp parallel
    {
        T map_val = baseval;

        #pragma omp for nowait
        for (auto i = 0U; i < size; ++i)
        {
            map_val = reducer(map_val, mapper(*(in + i)));
        }

        #pragma omp critical
        val = reducer(val, map_val);
    }

    return val;
}

It is fast but there is certainly room for optimisation, especially around for (auto i = 0U; i < size; ++i) I think. (But I couldn't figure how to make an iterator-only version with OpenMP, any help would be appreciated!).

On a quick test with 1000000 elements array, and the computation iterated 1000 times to have a mean value, I made some comparisons.

Version 1 :

for (auto i = 0U; i < size; ++i)
    val += std::pow(in[i][0], 2) + std::pow(in[i][1], 2);

score when compiled with:

  • g++ : 30 seconds
  • g++ -O3 : 2.6 seconds

Version 2 :

This version is the most optimized for this computation I think. (It gives the best result).

#pragma omp parallel reduction( + : val )
{
    double map_val = 0.0;

    #pragma omp for
    for (int i=0; i < size; ++i)
    {
        map_val += std::pow(in[i][0], 2) + std::pow(in[i][1], 2);
    }

    val += map_val;
}
  • g++ -O3 : 0.2 seconds (it's the best one)

Version 3

This version uses the MapReduce_n function template I shown earlier :

double val = MapReduce_n(in, size, 0.0, [] (fftw_complex val)
    {
        return std::pow(val[0], 2.0) + std::pow(val[1], 2.0);
    }, std::plus<double>());
  • g++ -O3 : 0.4 seconds, so there is a slight overhead for not using directly the OMP reduce directly. However, it doesn't allows custom operators, so at one point you (sadly) have to trade speed for genericity.
0
On

As of C++17 there is also std::transform_reduce, which also has the benefit of being parallelizable.

https://en.cppreference.com/w/cpp/algorithm/transform_reduce

0
On

I am surprised noone said how to do this with Boost.Range:

accumulate(v | transformed((int(*)(int))&std::abs), 0);

where v is a Singe Pass Range (ie, any STL container). The abs overload has to be specified, otherwise this would be as elegant as Haskell.