I want to parse different kinds of chunks in a file with varying length, so I created a function to read out a chunk by passing in the ifstream, like this:
void parse_next(std::ifstream& input_file, std::vector<uint8_t>& data, size_t count)
{
std::copy_n(
std::istreambuf_iterator<char>(input_file),
count,
std::back_inserter(data)
);
}
I expected the file position to increment count, i.e.,
// some init code
size_t const pos_before{input_file.tellg()};
parse_next(input_file, data, count);
size_t const pos_after{input_file.tellg()};
// this assumption is _not_ correct!
assert(count == (pos_after - pos_before));
// but this is!
assert((count - 1) == (pos_after - pos_before));
However, using the input_file.read() with count instead of std::copy_n gives the right count.
So what's going on here? I can't see anywhere in the documentation of istreambuf_iterator where this is mentioned.
Or is it the std::copy_n that is messing with me?
Note that in the example above, we can assume that there is plenty of data left to read, so it is not because the file is empty. Further, the file is opened as binary.
You're using
istreambuf_iterator. It is an input-only iterator. Imagine that you have a file with 5 bytes and you readcount=2:sgetcto read the first byte. This does not advance the stream position.count=2,copy_nneeds one more byte. So it increments the stream position.sgetc.count=2, no more byte are required.copy_nreturns.Note that only step 2 increments the stream position, and it only needs to be called once when reading two characters.
Yes, this is strange. But most people would just use
input_file.read(). I've almost never seen people useistreambuf_iteratorin production code...not least of all because it is inefficient for your type of use case.We could say hey, let's change
copy_nto increment the iterator before returning. That would fix this 0.1% use case, at the cost of slowing down other use cases.