I noticed that destructors of binary_oarchive and filtering_streambuf are extremely slow and I don't know why. I'm running the following code:
namespace bfs = boost::filesystem;
namespace bar = boost::archive;
namespace bio = boost::iostreams;
using clock = std::chrono::steady_clock;
using time_point = std::chrono::time_point<clock>;
using duration = std::chrono::duration<double>;
// main code:
time_point start, now;
duration diff;
bfs::ofstream file(path, std::ios_base::out | std::ios_base::binary);
{
bio::filtering_streambuf<bio::output> stream;
stream.push(bio::gzip_compressor());
stream.push(file);
{
bar::binary_oarchive oarch(stream);
start = clock::now();
oarch << BIG_OBJECT; // results in around 2GiB of output
now = clock::now();
diff = now - start;
std::cout << "Writing took " << diff.count() << " seconds." << std::endl;
start = now;
} // 'oarch' destructor should run
now = clock::now();
diff = now - start;
std::cout << "'oarch' destructor took " << diff.count() << " seconds." << std::endl;
start = now;
} // 'stream' destructor should run
now = clock::now();
diff = now - start;
std::cout << "'stream' destructor took " << diff.count() << " seconds." << std::endl;
The output that I'm getting is:
Writing took 1709.93 seconds. // around 28 minutes
'oarch' destructor took 2226.82 seconds. // around 37 minutes
'stream' destructor took 2177.07 seconds. // around 36 minutes
I run md5sum on the output file every 10 seconds to see if anything changed:
- after the writing has finished, the output file was around 2GiB in size,
- while the 'oarch' destructor was running, the contents and the size of the output file didn't change (the md5 hash was the same the entire time),
- after the 'oarch' destructor finished, the file grew only by around 2KiB,
- after the 'stream' destructor finished, the file was the same as when the 'oarch' destructor finished (the same size and md5 hash).
So, why are those destructors so slow, even though barely anything happens to the output file after it has been written to?
A possibly important point: the entire code uses 8GiB of RAM and ~70GiB of swap.
Three steps:
First: Naively Reproducing
I've simplified your code and made it self contained:
Live On Coliru
Which unsurprisingly prints (online):
And when run locally with the 2.1GiB
output.bin:What gives?
Second: Object Tracking
Having some experience with the library, I can suspect that your huge object is actually an object graph with types that engage the archive's Object Tracking. This feature enables aliasing pointers, even in the case of shared pointers etc. (So if your graph contains the same object in several nodes, it will only be serialized once).
So, replacing BIG_OBJECT with a vector of custom types:
This
NUM_ELEMENTSof 90Mio is about the largest I can run on my 32GiB system. This is due to bookkeeping overhead for object tracking in the archive. Output: (compare Live On Coliru)As you can see, the serialization time is dominated by allocation overhead for object tracking:

and the destructor (completing the archive) deals with deallocation:

The stream's destructor dose a close, which does
zlib_base::resetflushing the compression, which apparently takes significant time for this 529MiB archive (859 MiB uncompressed):Third: Workaround/Fix
I'd say it's really unlikely for your object graph to be so finely-grained and tangled that object tracking overhead becomes the bottle-neck.
This tells me that in reality you might not care about object tracking for your type. You can signal this:
With only that modification, the same code now runs in ~38% the time:
(compare Live On Coliru)
Most importantly, there's no more overhead tracking objects slowing the archive's completion.
If you want to disable object tracking whole-sale, use the archive flag for that:
CAUTION Only do this when you know what the implication are for your application. It is entirely possible to break your serialization semantics when this improperly.
Again, compare Coliru.