So I have a vector which has three numbers. 65, 66, and 67. I am converting these numbers from int to binary and appending them in a string. the string becomes 100000110000101000011 (65, 66, 67 respectively). I am writing this data into a file through dynamic_bitset library. I have BitOperations class which does the reading and writing into file work. When I read the data from file instead of giving the above bits it gives me these 001100010100001000001 bits.
Here is my BitOperations class:
#include <iostream>
#include <boost/dynamic_bitset.hpp>
#include <fstream>
#include <streambuf>
#include "Utility.h"
using namespace std;
using namespace boost;
template <typename T>
class BitOperations {
private:
T data;
int size;
dynamic_bitset<unsigned char> Bits;
string fName;
int bitSize;
public:
BitOperations(dynamic_bitset<unsigned char> b){
Bits = b;
size = b.size();
}
BitOperations(dynamic_bitset<unsigned char> b, string fName){
Bits = b;
this->fName = fName;
size = b.size();
}
BitOperations(T data, string fName, int bitSize){
this->data = data;
this->fName = fName;
this->bitSize = bitSize;
}
BitOperations(int bitSize, string fName){
this->bitSize = bitSize;
this->fName = fName;
}
void writeToFile(){
if (data != ""){
vector<int> bitTemp = extractIntegersFromBin(data);
for (int i = 0; i < bitTemp.size(); i++){
Bits.push_back(bitTemp[i]);
}
}
ofstream output(fName, ios::binary| ios::app);
ostream_iterator<char> osit(output);
to_block_range(Bits, osit);
cout << "File Successfully modified" << endl;
}
dynamic_bitset<unsigned char> readFromFile(){
ifstream input(fName);
stringstream strStream;
strStream << input.rdbuf();
T str = strStream.str();
dynamic_bitset<unsigned char> b;
for (int i = 0; i < str.length(); i++){
for (int j = 0; j < bitSize; ++j){
bool isSet = str[i] & (1 << j);
b.push_back(isSet);
}
}
return b;
}
};
And here is the code which calls theses operations:
#include <iostream>
// #include <string.h>
#include <boost/dynamic_bitset.hpp>
#include "Utility/BitOps.h"
int main(){
vector<int> v;
v.push_back(65);
v.push_back(66);
v.push_back(67);
stringstream ss;
string st;
for (int i = 0; i < v.size(); i++){
ss = toBinary(v[i]);
st += ss.str().c_str();
cout << i << " )" << st << endl;
}
// reverse(st.begin(), st.end());
cout << "Original: " << st << endl;
BitOperations<string> b(st, "bits2.bin", 7);
b.writeToFile();
BitOperations<string>c(7, "bits2.bin");
boost::dynamic_bitset<unsigned char> bits;
bits = c.readFromFile();
string s;
// for (int i = 0; i < 16; i++){
to_string(bits, s);
// reverse(s.begin(), s.end());
// }
cout << "Decompressed: " << s << endl;
}
What am I doing wrong which results in incorrect behaviour?
EDIT: Here is the extractIntegersFromBin(string s) function.
vector<int> extractIntegersFromBin(string s){
char tmp;
vector<int> nums;
for (int i = 0; s[i]; i++ ){
nums.push_back(s[i] - '0');
}
return nums;
}
Edit 2: Here is the code for toBinary:
stringstream toBinary(int n){
vector<int> bin, bin2;
int i = 0;
while (n > 0){
bin.push_back(n % 2);
n /= 2;
i++;
}
// for (int j = i-1; j >= 0; j--){
// bin2.push_back(bin[j]);
// }
reverse(bin.begin(), bin.end());
stringstream s;
for (int i = 0; i < bin.size(); i++){
s << bin[i];
}
return s;
}
You are facing two different issues:
The boost function
to_block_rangewill pad the output to the internal block size, by appending zeros at the end. In your case, the internal block size issizeof(unsigned char)*8 == 8. So if the bit sequence you write to the file inwriteToFileis not a multiple of8, additional0s will be written to make for a multiple of8. So if you read the bit sequence back in withreadFromFile, you have to find some way to remove the padding bits again.There is no standard way for how to represent a bit sequence (reference). Depending on the scenario, it might be more convenient to represent the bits left-to-right or right-to-left (or some completely different order). For this reason, when you use different code pieces to print the same bit sequence and you want these code pieces to print the same result, you have to make sure that these code pieces agree on how to represent the bit sequence. If one piece of code prints left-to-right and the other right-to-left, you will get different results.
Let's discuss each issue individually:
Regarding issue 1
I understand that you want to define your own block size with the
bitSizevariable, on top of the internal block size ofboost::dynamic_bitset. For example, in yourmainmethod, you constructBitOperations<string> c(7, "bits2.bin");. I understand that to mean that you expect the bit seqence stored in the file to have a length that is some multiple of7.If this understanding is correct, you can remove the padding bits that have been inserted by
to_block_rangeby reading the file size and then rounding it down to the nearest multiple of your block size. Though you should note that you currently do not enforce this contract in theBitOperationconstructor or inwriteToFile(i.e. by ensuring that the data size is a multiple of7).In your
readFromFilemethod, first note that the inner loop incorrectly takes theblockSizeinto account. So ifblockSizeis7, this incorrectly only considers the first7bits of each block. Whereas the blocks that were written byto_block_rangeuse the full8bit of each1-byte block, sinceboost::dynamic_bitsetdoes not know anything about your7-bit block size. So this makes you miss some bits.Here is one example for how to fix your code:
This example first calculates how many bits should be read in total, by rounding down the file size to the nearest multiple of your block size. It then iterates over the full bytes in the input (i.e. the internal blocks that were written by
boost::dynamic_bitset), until the targeted number of bits have been read. The remaining padding bits are discarded.An alternative method would be to use
boost::from_block_range. This allows you to get rid of some boiler plate code (i.e. reading the input into some string buffer):Regarding issue 2
Once you have solved issue 1, the
boost::dynamic_bitsetthat is written to the file bywriteToFilewill be the same as the one read byreadFromFile. If you print both with the same method, the output will match. However, if you use different methods for printing, and these methods do not agree on the order in which to print the bits, you will get different results.For example, in the output of your program you can now see that the "Original:" output is the same as "Decompressed:", except in reverse order:
Again, this does not mean that
readFromFileis working incorrectly, only that you are using different ways of printing the bit sequences.The output for
Original:is obtained by directly printing the0/1input string inmainfrom left to right. InwriteToFile, this string is then decomposed in the same order withextractIntegersFromBinand each bit is passed to thepush_backmethod ofboost::dynamic_bitset. Thepush_backmethod appends to the end of the bit sequence, meaning it will interpret each bit you pass as more significant than the previous (reference):Therefore, your input string is interpreted such that the first bit in the input string is the least significant bit (i.e. the "first" bit of the sequence), and the last bit of the input string is the most significant bit (i.e. the "last" bit of the sequence).
Whereas you construct the output for "Decompressed:" with
to_string. From the documentation of this method, we can see that the least-significant bit of the bit sequence will be the last bit of the output string (reference):So the problem is simply that
to_string(by design) prints in opposite order compared to the order in which you print the input string manually. So to fix this, you have to reverse one of these, i.e. by printing the input string by iterating over the string in reverse order, or by reversing the output ofto_string.