How to read a file into a vector elegantly and efficiently?

1.3k Views Asked by At
#include <fstream>
#include <vector>
#include <algorithm>
#include <iterator>

using namespace std;

vector<char> f1()
{
    ifstream fin{ "input.txt", ios::binary };
    return
    {
        istreambuf_iterator<char>(fin),
        istreambuf_iterator<char>()
    };
}

vector<char> f2()
{
    vector<char> coll;
    ifstream fin{ "input.txt", ios::binary };
    char buf[1024];
    while (fin.read(buf, sizeof(buf)))
    {
        copy(begin(buf), end(buf),
            back_inserter(coll));
    }

    copy(begin(buf), begin(buf) + fin.gcount(),
        back_inserter(coll));

    return coll;
}

int main()
{
    f1();
    f2();
}

Obviously, f1() is more concise than f2(); so I prefer f1() to f2(). However, I worry that f1() is less efficient than f2().

So, my question is:

Will the mainstream C++ compilers optimize f1() to make it as fast as f2()?

Update:

I have used a file of 130M to test in release mode (Visual Studio 2015 with Clang 3.8):

f1() takes 1614 ms, while f2() takes 616 ms.

f2() is faster than f1().

What a sad result!

2

There are 2 best solutions below

1
On BEST ANSWER

I've checked your code on my side using with mingw482. Out of curiosity I've added an additional function f3 with the following implementation:

inline vector<char> f3()
{
    ifstream fin{ filepath, ios::binary };
    fin.seekg (0, fin.end);
    size_t len = fin.tellg();
    fin.seekg (0, fin.beg);

    vector<char> coll(len);
    fin.read(coll.data(), len);
    return coll;
}

I've tested using a file ~90M long. For my platform the results were a bit different than for you.

  • f1() ~850ms
  • f2() ~600ms
  • f3() ~70ms

The results were calculated as mean of 10 consecutive file reads.

The f3 function takes the least time since at vector<char> coll(len); it has all the required memory allocated and no further reallocations need to be done. As to the back_inserter it requires the type to have push_back member function. Which for vector does the reallocation when capacity is exceeded. As described in docs:

push_back

This effectively increases the container size by one, which causes an automatic reallocation of the allocated storage space if -and only if- the new vector size surpasses the current vector capacity.

Among f1 and f2 implementations the latter is slightly faster although both use the back_inserter. The f2 is probably faster since it reads the file in chunks which allows some buffering to take place.

0
On

If smaller than some GB you can read all at once:

#include "sys/stat.h"
        ....

char* buf;
FILE* fin;
filename="myfile.cgt";
#ifdef WIN32
   struct stat st;
  if (stat(filename, &st) == -1) return 0;
#else
    struct _stat st;
if (_stat(filename, &st) == -1) return 0;
#endif
    fin = fopen(filename, "rb");
    if (!fin) return 0;
    buf = (char*)malloc(st.st_size);
    if (!buf) {fclose(fin); return 0;}
    fread(buf, st.st_size, 1, fin);
    fclose(fin);

No need to say you should use "new" in C++ not malloc()