boost multiprecision cpp_int output/to string very slow

481 Views Asked by At

Playing around with the boost multiprecision library. Calculating some big factorial numbers and such.

Problem is the output takes too long. 100,000! takes 0.5 seconds to calculate and 11 seconds to print. 1,000,000! takes half an hour (yes the output only).

Using cout for the output with > to put it to a file: ./prg > file

Tried putting it to a string first, stringstream, normal cout. Everything the same.

The convertion to a string just takes very long. Is there any way to speed it up?

Code example:

#include <iostream>
#include <chrono>
#include <string>
#include <boost/multiprecision/cpp_int.hpp>

int main() {
    uint32_t num = 100000;
    boost::multiprecision::cpp_int result = 1;

    std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();

    for (uint32_t i = 2; i <= num; i++) {
        result *= i;
    }

    std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
    std::cout << "calculation: " << std::chrono::duration_cast<std::chrono::milliseconds> (end - begin).count() / 1000.0 << " sec" << std::endl;

    std::string s = result.str();

    std::chrono::steady_clock::time_point endOutput = std::chrono::steady_clock::now();
    std::cout << "toString: " << std::chrono::duration_cast<std::chrono::milliseconds> (endOutput - end).count() / 1000.0 << " sec" << std::endl;

    std::cout << "length: " << s.length() << std::endl;

    return 0;
}

output:

calculation: 1.014 sec
toString: 7.643 sec
length: 456574

output same code in java using BigInteger:

calculation: 2.646 sec
toString: 0.466 sec
length: 456574
1

There are 1 best solutions below

0
On

I was going to recommend you post a self-answer. But then I had this crammed into a single comment:

feel free to self-answer @Richard, so the hint will help others. Also, a pet-peeve of mine: C++ doesn't need to look horrible: https://godbolt.org/z/53d338843. In fact I'd go further and make a lap function https://godbolt.org/z/6nf659Y91. Also nice: https://wandbox.org/permlink/3KCKBn1tOwwJCkIz

So I figured that I might as well post it here for added value of demonstration code.

Modernizing The Repro

We can express that program a lot cleaner:

#include <boost/multiprecision/gmp.hpp>
#include <chrono>
#include <iostream>

using namespace std::chrono_literals;
namespace bmp = boost::multiprecision;
auto now      = std::chrono::steady_clock::now;

auto factorial(uint32_t num) {
    bmp::mpz_int result{1};
    while (num)
        result *= num--;
    return result;
}

int main() {
    auto start = now();
    auto result = factorial(100'000);

    auto mid = now();
    std::cout << "calculation: " << (mid - start) / 1.s << "s" << std::endl;

    std::string s = result.str();

    std::cout << "toString: "    << (now() - mid) / 1.s << "s" << std::endl;
    std::cout << "length: "      << s.length()          << "\n";
}

Which may print something like

calculation: 2.17467s
toString: 0.0512504s
length: 456574

More Comparative Benchmarks

To see how much better mpz_int may perform, let's compare them:

Live On Wandbox

#include <boost/multiprecision/cpp_int.hpp>
#include <boost/multiprecision/gmp.hpp>
#include <chrono>
#include <iostream>

using namespace std::chrono_literals;
namespace bmp = boost::multiprecision;

template <typename T> T factorial(uint32_t num) {
    T result{1};
    while (num)
        result *= num--;
    return result;
}

#define TIMED(expr)                                                            \
    [&]() -> decltype(auto) {                                                  \
        using C = std::chrono::steady_clock;                                   \
        using namespace std::chrono_literals;                                  \
        struct X {                                                             \
            C::time_point s = C::now();                                        \
            ~X() {                                                             \
                std::cerr << std::fixed << (C::now() - s) / 1.s << "s\t"       \
                          << #expr << std::endl;                               \
            }                                                                  \
        } x;                                                                   \
        return (expr);                                                         \
    }()

template <typename T> void bench() {
    auto r = TIMED(factorial<T>(50'000));
    auto s = TIMED(r.str());
    std::cout << "length: " << s.length() << "\n";
}

int main() {
    TIMED(bench<bmp::mpz_int>());
    std::cout << "-----\n";
    TIMED(bench<bmp::cpp_int>());
}

Which may print something like

0.953427s       factorial<T>(100'000)
0.040691s       r.str()
length: 456574
0.994284s       bench<bmp::mpz_int>()
-----
1.410608s       factorial<T>(100'000)
8.014350s       r.str()
length: 456574
9.425064s       bench<bmp::cpp_int>()

As you can see GMP is orders of magnitude more optimized (~200x faster for the str() operation in my test run)