Fill a vector of uint16_t using a boost::dynamic_bitset<>

796 Views Asked by At

I am creating a program that compresses files with Huffman compression. Originally I was using a vector of uint8_t to store bytes from the file, but the performance was horrible (2 hours to decompress a 74 MB file). I have decided to use 16 bit chunks to represent values from the file.

Originally, I had this (the input bitset has 520 million bits in it)

std::vector<uint8_t> bytes;
boost::dynamic_bitset<unsigned char> input;

boost::to_block_range(input, std::back_inserter(bytes));

This worked great, and it filled a vector full of 8 bit integers that represented each byte of the file. The frequencies of each bit are recorded in a vector of integers of size 256. This is running horribly. It takes absolutely forever to decode a string, since the frequencies of these integers in my file are HUGE. I thought it would be better if I used 16 bit integers, and stored frequencies in a vector of size 65536. Here is my attempt at filling my vector of "bytes":

std::vector<uint16_t> bytes;
boost::dynamic_bitset<unsigned char> input;

boost::to_block_range(input, std::back_inserter(bytes));

The problem here is that the to_block_range() function is taking 8 bits out of my bitset and padding them with 8 zeroes, rather than taking 16 bytes out at a time.

Is there any way to fill a vector of uint16_t from a dynamic bitset in this fashion?

1

There are 1 best solutions below

4
On

The problem here might not be what you think.

In your byte-based approach, adding a reserve call is likely to improve things considerably.

std::vector<uint8_t> bytes;
boost::dynamic_bitset<unsigned char> input;

bytes.reserve(input.num_blocks());
boost::to_block_range(input, std::back_inserter(bytes));

The problem with just inserting to the back of a vector is that the vector will get copied multiple times while it is growing. You can avoid this by giving it enough memory to work with.