Have anybody written a C++ STL-compliant algorithm that combines std::transform and std::accumulate into a single pass algorithm supporting both the unary, binary and perhaps even (n-ary!) variant, say std::transformed_accumulate? I want this because I have found this pattern highly reusable in for example linear algebra for example in (l1-)norm calculations. The l1-norm calculates the sum of the absolute values of the elements.
Transform-and-Accumulate
6k Views Asked by Nordlöw AtThere are 5 best solutions below
                        
                            
                        
                        
                            On
                            
                                                    
                    
                Although it may not exactly fit the original intent, std::inner_product is basically your binary version. You pass it an initial value, two ranges, and two functors, and it applies them as:
T acc = initial_value;
while (begin1 != end1) {
    acc = binary_op1(acc, binary_op2(begin1, begin2);
    ++begin1;
    ++begin2;
return acc;
So, for your L1 you'd do something on this general order:
norm = std::inner_product(input1.begin(), input1.end(), 
                          input2.begin(), input2.end(), 
                          std::plus<int>(), std::abs);
Only that doesn't quite work -- right now, it's trying to pass std::abs where you really need a binary function that combines the two inputs, but I'm not sure how the two inputs are really supposed to be combined.
std::partial_sum is fairly close to your unary version, except that along with accumulating a result, it (attempts to) write out each intermediate result, not just the final result. To just get the final result, you'd have to write (and pass an instance of) a kind of do-nothing iterator that just holds a single value:
template<class T, class Dist=size_t, class Ptr = T*, class Ref = T&>
class unique_it : public std::iterator<std::random_access_iterator_tag, T, Dist, Ptr, Ref> { 
   T &value;
public:
   unique_it(T &v) : value(v) {}
   T &operator*() { return value; }
   unique_it &operator++() { return *this; }
   unique_it &operator+(size_t) { return *this; }
   unique_it &operator++(int) { return *this; }
};
template <class T>
unique_it<T> make_res(T &v) { return unique_it<T>(v); }
With this, your L1 normalization would look something like this:
int main(){ 
    double result=0.0;
    double inputs[] = {1, -2, 3, -4, 5, -6};
    std::partial_sum(
        inputs, inputs+6, 
        make_res(result),
        [](double acc, double v) {return acc + std::abs(v);});
    std::cout << result << "\t";
    return 0;
}
                        
                        
                            
                        
                        
                            On
                            
                                                    
                    
                If you want to use some parallelism, I made a quick version using OpenMP :
template <class T, 
          class InputIterator, 
          class MapFunction, 
          class ReductionFunction>
T MapReduce_n(InputIterator in, 
              unsigned int size, 
              T baseval, 
              MapFunction mapper, 
              ReductionFunction reducer)
{
    T val = baseval;
    #pragma omp parallel
    {
        T map_val = baseval;
        #pragma omp for nowait
        for (auto i = 0U; i < size; ++i)
        {
            map_val = reducer(map_val, mapper(*(in + i)));
        }
        #pragma omp critical
        val = reducer(val, map_val);
    }
    return val;
}
It is fast but there is certainly room for optimisation, especially around for (auto i = 0U; i < size; ++i) I think. (But I couldn't figure how to make an iterator-only version with OpenMP, any help would be appreciated!).
On a quick test with 1000000 elements array, and the computation iterated 1000 times to have a mean value, I made some comparisons.
Version 1 :
for (auto i = 0U; i < size; ++i)
    val += std::pow(in[i][0], 2) + std::pow(in[i][1], 2);
score when compiled with:
g++: 30 secondsg++ -O3: 2.6 seconds
Version 2 :
This version is the most optimized for this computation I think. (It gives the best result).
#pragma omp parallel reduction( + : val )
{
    double map_val = 0.0;
    #pragma omp for
    for (int i=0; i < size; ++i)
    {
        map_val += std::pow(in[i][0], 2) + std::pow(in[i][1], 2);
    }
    val += map_val;
}
g++ -O3: 0.2 seconds (it's the best one)
Version 3
This version uses the MapReduce_n function template I shown earlier :
double val = MapReduce_n(in, size, 0.0, [] (fftw_complex val)
    {
        return std::pow(val[0], 2.0) + std::pow(val[1], 2.0);
    }, std::plus<double>());
g++ -O3: 0.4 seconds, so there is a slight overhead for not using directly the OMP reduce directly. However, it doesn't allows custom operators, so at one point you (sadly) have to trade speed for genericity.
                        
                            
                        
                        
                            On
                            
                                                    
                    
                As of C++17 there is also std::transform_reduce, which also has the benefit of being parallelizable.
https://en.cppreference.com/w/cpp/algorithm/transform_reduce
                        
                            
                        
                        
                            On
                            
                                                    
                    
                I am surprised noone said how to do this with Boost.Range:
accumulate(v | transformed((int(*)(int))&std::abs), 0);
where v is a Singe Pass Range (ie, any STL container). The abs overload has to be specified, otherwise this would be as elegant as Haskell.
Uhm... My bet is that you can do that by embedding your transformation into the binary predicate, tranform the element and accumulate after the transformation.
That functor would be equivalent to:
Of course this could be made generic, just pass the transformation functor to a generic base functor:
And you could also generalize on the type in the container... or even create a more generic transform_accumulator that takes both an accumulator and a transformation functors and applies them in order. Actual implementation left as an exercise for the reader.