Is there a limit on the number of values added to a boost::accumulator?

245 Views Asked by At

Is there a limit on how many values that can be added to a boost::accumulator? If a large number of entries were added, is there any point in which the accumulator would cease to work properly or is the internal algorithm robust enough to account for a set of values approaching infinity?

#include <iostream>
#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics/stats.hpp>
#include <boost/accumulators/statistics/mean.hpp>
#include <boost/accumulators/statistics/moment.hpp>
using namespace boost::accumulators;

int main()
{
    // Define an accumulator set for calculating the mean and the
    // 2nd moment ...
    accumulator_set<double, stats<tag::mean, tag::moment<2> > > acc;

    // push in some data ...
    for (std::size_t i=0; i<VERY_LARGE_NUMBER; i++)
    {
      acc(i);
    }


    // Display the results ...
    std::cout << "Mean:   " << mean(acc) << std::endl;
    std::cout << "Moment: " << moment<2>(acc) << std::endl;

    return 0;
}
2

There are 2 best solutions below

0
Ted Lyngmo On BEST ANSWER

If your int is a 32 bit integer, you'll get a signed integer overflow at 46341 * 46341 when calculating moment<2> and your program therefore has undefined behavior.

To avoid that, cast i to the type you're using in the accumulator:

acc(static_cast<double>(i));

This will now have the same limits as a normal double. You can add as many elements as you'd like to it as long as you don't exceed the limit (std::numeric_limits<double>::max()) for a double in the internal moment calculations (x2 for moment<2> or a sum that exceeds the limit).

0
tomocafe On

The accumulator statistics do not account for overflow, so you need to select the accumulator type carefully. It doesn't need to match the initial type of the objects you are adding—you can cast it when accumulating, then get the statistics and cast it back to the original type.

You can see it with this simple example:

#include <bits/stdc++.h>
#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics.hpp>

using namespace boost::accumulators;

int main(void) {
    accumulator_set<int8_t, features<tag::mean>> accInt8;
    accumulator_set<double, features<tag::mean>> accDouble;
    
    int8_t sum = 0; // range of int8_t: -128 to 127
    for (int8_t i = 1; i <= 100; i++) {
        sum += i; // this will overflow!
        accInt8(i); // this will also overflow
        accDouble((double)i);
    }
    
    std::cout << "sum from 1 to 100: " << (int)sum << " (overflow)\n";
    std::cout << "mean(<int8_t>): " << extract::mean(accInt8) << " (overflow)\n";
    std::cout << "mean(<double>): " << (int)extract::mean(accDouble) << "\n";
    
    return 0;
}

I used int8_t which has a very small range (-128 to 127) to demonstrate that getting the mean from values 1 to 100 (which should be 50) overflows if you use int8_t as the internal type for the accumulator_set.

The output is:

sum from 1 to 100: -70 (overflow)
mean(<int8_t>): -7 (overflow)
mean(<double>): 50