I am creating a file compression and decompression, and I don't know how to handle the remaining bits when I decompressed.
For example, I have 63 bits.length() and since byte = 8 bits, bits.length() % 8 = 7 there would still 7 bits. Now whenever I decompress the file it has a missing character.
Here is my compression code:
void compressFile(string inputFile) {
huffmanTree();
system("cls");
cout << "\n\n\t\t\t\tProcessing...";
Sleep(5000);
ifstream inputedFile(inputFile); // Open the file in binary mode
ofstream compressedFile("compressed.huff");
if (!inputedFile.is_open() || !compressedFile.is_open()) {
cout << "\t\t\t\tError: Unable to open file for compression." << endl;
return;
}
string bits; // Use a string to accumulate bits for each character
char ch;
while (inputedFile.get(ch)) {
bits += treeCode[(int)ch];
if(bits.length() >= 8){
// Process complete groups of 8 bits
for (int i = 0; i + 8 <= bits.length(); i += 8) {
compressedFile.put((char)stoi(bits.substr(i, 8), NULL, 2));
}
bits = bits.substr(bits.length() - bits.length() % 8);
}
}
if (!bits.empty()) {
compressedFile.put((char)stoi(bits, NULL, 2));
}
system("cls");
cout << "\t\t\t\t---------------------------------------------" << endl;
cout << "\n\n\t\t\t\tSuccessful: File has been compressed." << endl;
cout << "\n\n\t\t\t\tThe file name is compressed.huff." << endl;
cout << "\t\t\t\t---------------------------------------------" << endl;
cout << "\t\t\t\t";
system("pause");
inputedFile.close();
compressedFile.close();
}
Here is my decompression code:
void decompressFile(string compressedFile) {
system("cls");
cout << "\n\n\t\t\t\tProcessing...";
Sleep(5000);
ifstream compressedFileStream(compressedFile, ios::binary); // Open the file in binary mode
ofstream decompressedFile("decompressed.txt");
if (!compressedFileStream.is_open() || !decompressedFile.is_open()) {
cout << "\n\t\t\t\tError: Unable to open file for decompression." << endl;
return;
}
huffmanTree();
Node* root = head->node; // Save the root of the Huffman tree
Node* current = root; // Initialize the current node
char byte; // Read bytes for decompression
while (compressedFileStream.get(byte)) {
for (int i = 7; i >= 0; i--) {
// Traverse the tree based on each bit in the byte
char bit = (byte & (1 << i)) ? '1' : '0';
if (bit == '0') {
current = current->left;
}
else if (bit == '1') {
current = current->right;
}
if (current->left == NULL && current->right == NULL) {
decompressedFile << current->character;
cout << "decompressed" <<current->character;
current = root; // Reset current to the root for the next character
}
}
}
system("pause");
system("cls");
cout << "\t\t\t\t---------------------------------------------" << endl;
cout << "\n\n\t\t\t\tSuccessful: File has been decompressed." << endl;
cout << "\n\n\t\t\t\tThe file name is decompressed.txt." << endl;
cout << "\t\t\t\t---------------------------------------------" << endl;
cout << "\t\t\t\t";
system("pause");
compressedFileStream.close();
decompressedFile.close();
}
What should I do to decompress my compressed file without having a missing character.
Your code currently does:
I see some issues with it:
if (compressedFileStream.eof())in theforloop looks like you expect a file to end in the middle of a byte. That does not happen.Moreover, once you have done the test, it only makes sense to do it again after reading from the file: if you do not read anything, the value returned by it will not change.
If you managed to read a byte, you know it is valid; hence your test would make much more sense to do right before reading the next byte.
With that in mind, it seems all you have to do is move the
eofright before reading from the file:PS:
compressedFileStreamwas (you did not mention it in your question and I did not want to make any assumption there), it may be possible to just test the value returned bygetto know when you are trying to read after the file end.char bit = (byte & (1 << i)) ? '1' : '0'NULLis not C++. Usenullptrin the future (or test the pointers the way I did).Note: during compression, you must ensure the last bits of the last byte cannot be interpreted as a character.