Have anybody written a C++ STL-compliant algorithm that combines std::transform
and std::accumulate
into a single pass algorithm supporting both the unary, binary and perhaps even (n-ary!) variant, say std::transformed_accumulate
? I want this because I have found this pattern highly reusable in for example linear algebra for example in (l1-)norm calculations. The l1-norm calculates the sum of the absolute values of the elements.
Transform-and-Accumulate
6k Views Asked by Nordlöw AtThere are 5 best solutions below

Although it may not exactly fit the original intent, std::inner_product
is basically your binary version. You pass it an initial value, two ranges, and two functors, and it applies them as:
T acc = initial_value;
while (begin1 != end1) {
acc = binary_op1(acc, binary_op2(begin1, begin2);
++begin1;
++begin2;
return acc;
So, for your L1 you'd do something on this general order:
norm = std::inner_product(input1.begin(), input1.end(),
input2.begin(), input2.end(),
std::plus<int>(), std::abs);
Only that doesn't quite work -- right now, it's trying to pass std::abs
where you really need a binary function that combines the two inputs, but I'm not sure how the two inputs are really supposed to be combined.
std::partial_sum
is fairly close to your unary version, except that along with accumulating a result, it (attempts to) write out each intermediate result, not just the final result. To just get the final result, you'd have to write (and pass an instance of) a kind of do-nothing iterator that just holds a single value:
template<class T, class Dist=size_t, class Ptr = T*, class Ref = T&>
class unique_it : public std::iterator<std::random_access_iterator_tag, T, Dist, Ptr, Ref> {
T &value;
public:
unique_it(T &v) : value(v) {}
T &operator*() { return value; }
unique_it &operator++() { return *this; }
unique_it &operator+(size_t) { return *this; }
unique_it &operator++(int) { return *this; }
};
template <class T>
unique_it<T> make_res(T &v) { return unique_it<T>(v); }
With this, your L1 normalization would look something like this:
int main(){
double result=0.0;
double inputs[] = {1, -2, 3, -4, 5, -6};
std::partial_sum(
inputs, inputs+6,
make_res(result),
[](double acc, double v) {return acc + std::abs(v);});
std::cout << result << "\t";
return 0;
}

If you want to use some parallelism, I made a quick version using OpenMP :
template <class T,
class InputIterator,
class MapFunction,
class ReductionFunction>
T MapReduce_n(InputIterator in,
unsigned int size,
T baseval,
MapFunction mapper,
ReductionFunction reducer)
{
T val = baseval;
#pragma omp parallel
{
T map_val = baseval;
#pragma omp for nowait
for (auto i = 0U; i < size; ++i)
{
map_val = reducer(map_val, mapper(*(in + i)));
}
#pragma omp critical
val = reducer(val, map_val);
}
return val;
}
It is fast but there is certainly room for optimisation, especially around for (auto i = 0U; i < size; ++i)
I think. (But I couldn't figure how to make an iterator-only version with OpenMP, any help would be appreciated!).
On a quick test with 1000000 elements array, and the computation iterated 1000 times to have a mean value, I made some comparisons.
Version 1 :
for (auto i = 0U; i < size; ++i)
val += std::pow(in[i][0], 2) + std::pow(in[i][1], 2);
score when compiled with:
g++
: 30 secondsg++ -O3
: 2.6 seconds
Version 2 :
This version is the most optimized for this computation I think. (It gives the best result).
#pragma omp parallel reduction( + : val )
{
double map_val = 0.0;
#pragma omp for
for (int i=0; i < size; ++i)
{
map_val += std::pow(in[i][0], 2) + std::pow(in[i][1], 2);
}
val += map_val;
}
g++ -O3
: 0.2 seconds (it's the best one)
Version 3
This version uses the MapReduce_n function template I shown earlier :
double val = MapReduce_n(in, size, 0.0, [] (fftw_complex val)
{
return std::pow(val[0], 2.0) + std::pow(val[1], 2.0);
}, std::plus<double>());
g++ -O3
: 0.4 seconds, so there is a slight overhead for not using directly the OMP reduce directly. However, it doesn't allows custom operators, so at one point you (sadly) have to trade speed for genericity.

As of C++17 there is also std::transform_reduce
, which also has the benefit of being parallelizable.
https://en.cppreference.com/w/cpp/algorithm/transform_reduce

I am surprised noone said how to do this with Boost.Range:
accumulate(v | transformed((int(*)(int))&std::abs), 0);
where v is a Singe Pass Range (ie, any STL container). The abs overload has to be specified, otherwise this would be as elegant as Haskell.
Uhm... My bet is that you can do that by embedding your transformation into the binary predicate, tranform the element and accumulate after the transformation.
That functor would be equivalent to:
Of course this could be made generic, just pass the transformation functor to a generic base functor:
And you could also generalize on the type in the container... or even create a more generic transform_accumulator that takes both an accumulator and a transformation functors and applies them in order. Actual implementation left as an exercise for the reader.