Read int through char * binary data from a file with std::ifstream::read()

302 Views Asked by At

Background: This question is a follow up of this one.
The given answer suggesting to access the data through unsigned char * instead of char* worked successfully.

Main question: But how can we do if we have no choice ? (i.e. if char* imposed by a function prototype).


Context:

Let's assume that we have written an int array in binary format into a file.
It may look as (without errors checking):

const std::string bin_file("binary_file.bin");

const std::size_t len(10);
int test_data[len] {-4000, -3000, -2000, -1000, 0, 1000, 2000, 3000, 4000, 5000};

std::ofstream ofs(bin_file, std::ios::trunc | std::ios::binary);
for(std::size_t i = 0; i < len; ++i)
{
    ofs.write(reinterpret_cast<char*>(&test_data[i]), sizeof test_data[i]);
}
ofs.close();

Now I want to open the file, read it and print the previously written data one by one.

The opening is performed as follows (without errors checking):

std::ifstream ifs(bin_file, std::ios::binary); // open in binary mode

// get the length
ifs.seekg(0, ifs.end);
std::size_t byte_size = static_cast<std::size_t>(ifs.tellg());
ifs.seekg(0, ifs.beg);

At this point, byte_size == len*sizeof(int).


Possible solutions:

I know that I can do it either by:

int val;
for(std::size_t i = 0; i < len; ++i)
{
    ifs.read(reinterpret_cast<char*>(&val), sizeof val);
    std::cout << val << '\n';
}

or by:

int vals[len];
ifs.read(reinterpret_cast<char*>(vals), static_cast<std::streamsize>(byte_size));

for(std::size_t i = 0; i < len; ++i)
    std::cout << vals[i] << '\n';

Both of these solutions work fine but none of them are the purpose of this question.


Problem description:

I consider here the case where I want to get the full binary file contents into a char* and handle it afterwards.
I cannot use an unsigned char* since std::ifstream::read() is expecting a char*.

I tried:

char * buff = new char[byte_size];
ifs.read(buff, static_cast<std::streamsize>(byte_size));

int val = 0;
for(std::size_t i = 0; i < len; ++i)
{
    // Get the value via std::memcpy works fine
    //std::memcpy(&val, &buff[i*sizeof val], sizeof val);

    // Get the value via bit-wise shifts fails (guess: signedness issues)
    for(std::size_t j = 0; j < sizeof val; ++j)
        val |= reinterpret_cast<unsigned char *>(buff)[i*sizeof val + j] << CHAR_BIT*j; // For little-endian

    std::cout << val << '\n';
}

delete[] buff;

ifs.close();

With std::memcpy to copy the 4 bytes into the int, I got the expected results (the printed vals are the same values than the original ones).

With bit-wise shifting, even with reinterpret_cast<unsigned char*>ing the buffer, I got trash values resulting in failing to get back the original int value (the printed vals are "garbage" values: not the same values than the original ones).

My question is: What does std::memcpy to be able to get the right values back from a char* instead of an unsigned char* while it is not possible with my bit-wise shifting ?
And how could I solve it without using std::memcpy (for general interest purposes) ? I could not figure it out.

1

There are 1 best solutions below

0
On

Ok, this was a really stupid error, shame on me.

Actually, I forgot to reset val to zero before each next iteration...

The problem was not related to the bit-wise shifting, and the reinterpret_cast<unsigned char *> worked successfully.

The corrected version should be:

char * buff = new char[byte_size];
ifs.read(buff, static_cast<std::streamsize>(byte_size));

int val = 0;
for(std::size_t i = 0; i < len; ++i)
{
    for(std::size_t j = 0; j < sizeof val; ++j)
        val |= reinterpret_cast<unsigned char *>(buff)[i*sizeof val + j] << CHAR_BIT*j; // For little-endian

    std::cout << val << '\n';
    val = 0; // Reset the val
}

delete[] buff;

ifs.close();

For those who don't like casting, we can replace it with a mask as follows:

char * buff = new char[byte_size];
ifs.read(buff, static_cast<std::streamsize>(byte_size));

int val = 0;
for(std::size_t i = 0; i < len; ++i)
{
    int mask = 0x000000FF;
    for(std::size_t j = 0; j < sizeof val; ++j)
    {
        val |= (buff[i*sizeof val + j] << CHAR_BIT*j) & mask; // For little-endian
        mask = mask << CHAR_BIT;
    }

    std::cout << val << '\n';
    val = 0; // Reset the val
}

delete[] buff;

ifs.close();

Perfect example when the issue comes from between the keyboard and the chair :)