Read file with binary format, and store binary data with string

130 Views Asked by At

What I want to do.
Read the file in binary format, save it as a hex string, and display it to the user

What I've done.
1. File open with binary format,
2. Read first two bytes, because it is size.
3. Read according to the size and store it in a vector.
4. Convert the read value from an integer to a hexadecimal string format.
5. Repeat steps 2, 3, and 4 until the end of the file

I have confirmed that the sample code below works correctly. However, the problem is that when the size of the binary exceeds MB units, it takes about 10 seconds just to convert it. Concatenating strings one by one inside a for loop and converting them individually seems to be very time-consuming. I'm curious if there is a faster way to perform the same operation.

<dump>
enter image description here

<output>
AAAAAA
BBBB
CC

std::ifstream is(PATH, std::ios::in | std::ifstream::binary)
int index = 0;
std::string str;

if(is)
{
    while(is.peek() != EOF)
    {
        char len[2] = {0, 0};
        is.read(len, 2);
        int size = len[0] | len[1]<<8;
    
        std::vector<uint8_t> buf;
        buf.resize(size);
        is.seekg(index+2);
        index = index + size + 2;
        is.read((char*)&buf[0], size);

        for(int i = 0 ; i<buf.size(); i++)
        {
            char tmp[3];
            sprintf(tmp, "%02x", buf[i]);
            str+=tmp;
        }
        str+="\n";
    }
}
printf("%s\n", str.c_str());

I have confirmed that the sample code below works correctly. However, the problem is that when the size of the binary exceeds MB units, it takes about 10 seconds just to convert it. Concatenating strings one by one inside a for loop and converting them individually seems to be very time-consuming. I'm curious if there is a faster way to perform the same operation.

2

There are 2 best solutions below

0
anatolyg On

In general, you should do profiling to determine what the slowest part of your program is. The result of profiling is a percentage of time each line of your code took during execution. Then take the highest percentage (most problematic line) and try to improve it.

For example, suppose the slowest line in your code is sprintf. You can use low-level alternatives, like generating hexadecimal digits manually:

tmp[0] = buf[i] / 16 + (buf[i] / 16 < 10 ? '0' : 'a' - 10);
tmp[1] = buf[i] % 16 + (buf[i] % 16 < 10 ? '0' : 'a' - 10);

But do such optimizations only after you determine that they would be effective — after profiling.

0
Thomas Matthews On

I never use a string when writing these tools.

while (my_file.read(buffer, 16))
{
    std::cout << setw(6) << setfill('0') << address;
    size_t bytes_to_print = my_file.gcount();
    for (unsigned int i = 0U; i < bytes_to_print; ++i)
    {
        if (i % 8u == 0u)
        {
            std::cout << "     ";
        }
        std::cout << ' ';
        std::cout << hex << setw(2) << setfill('0') << buffer[i];
     }
     // Print the character interpretations here.
     std::cout << "\n";
}

The printing of the printable section is left as an exercise for the reader / OP.