I am creating a program that compresses files with Huffman compression. Originally I was using a vector of uint8_t to store bytes from the file, but the performance was horrible (2 hours to decompress a 74 MB file). I have decided to use 16 bit chunks to represent values from the file.
Originally, I had this (the input bitset has 520 million bits in it)
std::vector<uint8_t> bytes;
boost::dynamic_bitset<unsigned char> input;
boost::to_block_range(input, std::back_inserter(bytes));
This worked great, and it filled a vector full of 8 bit integers that represented each byte of the file. The frequencies of each bit are recorded in a vector of integers of size 256. This is running horribly. It takes absolutely forever to decode a string, since the frequencies of these integers in my file are HUGE. I thought it would be better if I used 16 bit integers, and stored frequencies in a vector of size 65536. Here is my attempt at filling my vector of "bytes":
std::vector<uint16_t> bytes;
boost::dynamic_bitset<unsigned char> input;
boost::to_block_range(input, std::back_inserter(bytes));
The problem here is that the to_block_range() function is taking 8 bits out of my bitset and padding them with 8 zeroes, rather than taking 16 bytes out at a time.
Is there any way to fill a vector of uint16_t from a dynamic bitset in this fashion?
The problem here might not be what you think.
In your byte-based approach, adding a
reserve
call is likely to improve things considerably.The problem with just inserting to the back of a vector is that the vector will get copied multiple times while it is growing. You can avoid this by giving it enough memory to work with.