It is said in the documentation of the ifstream::getline method that:
The number of characters successfully read and stored by this function can be accessed by calling member gcount. https://cplusplus.com/reference/istream/istream/getline/
In any case, if count > 0, it then stores a null character CharT() into the next successive location of the array and updates gcount(). https://en.cppreference.com/w/cpp/io/basic_istream/getline
From both of the above resources documenting ifstream::getline it can be deduced that gcount is supposed to be changed even after encountering the end of file (EOF). That's due to the fact that any case includes the EOF case and we all know that an update is only an update if it changes the target record.
It is said in the documentation of ifstream::gcount method that it:
Returns the number of characters extracted by the last unformatted input operation performed on the object. https://cplusplus.com/reference/istream/istream/gcount/
Returns the number of characters extracted by the last unformatted input operation, or the maximum representable value of std::streamsize if the number is not representable. https://en.cppreference.com/w/cpp/io/basic_istream/gcount
If it's the number of characters extracted from the ifstream, then the CPlusPlus.com documentation of getline must be wrong as it states "characters successfully read and stored".
Also, the CppReference.com would be wrong, because it states that "in any case ... updates gcount()" but gcount is not updated when an EOF is encountered before the line end delimiter.
If it's the number of characters written into the array buffer argument of ifstream::getline, then the standard library has a bug. When during the execution of ifstream::getline the line ends prematurely with end-of-file (EOF), the null character is appended to the end of the array buffer but gcount is not updated accordingly.
Here is the code that exemplifies the dilemma.
#include <stdlib.h>
#include <iostream>
#include <array>
#include <fstream>
#include <limits>
#include <cstring>
int main(int argc, char **argv) {
if (argc < 2) {
std::cerr << "Usage: " << argv[0] << " file\n";
return EXIT_FAILURE;
}
std::array<char, 10> buf;
std::ifstream file;
file.open(argv[1], std::ifstream::in);
do {
file.clear();
file.getline(buf.data(), buf.size());
std::streamsize gcount = file.gcount();
if (file.bad() || gcount <= 0) {
break;
}
if (!file.fail()) {
std::cerr
<< "LINE: [" << buf.data() << "] gcount "
<< std::to_string(gcount) << ", strlen "
<< std::to_string(strlen(buf.data()))
<< (file.eof() ? " (EOF)\n" : "\n");
continue;
}
// Buffer must have got full. Let's skip to the end of line.
file.clear();
file.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
}
while (!file.eof() && !file.bad());
file.close();
return EXIT_SUCCESS;
}
Here is the output I get for a text file that does not have a newline character in the end of its last line.
LINE: [dgsagdsa] gcount 9, strlen 8
LINE: [test] gcount 5, strlen 4
LINE: [test123] gcount 8, strlen 7
LINE: [123test] gcount 8, strlen 7
LINE: [] gcount 1, strlen 0
LINE: [xxxxxxx] gcount 8, strlen 7
LINE: [yy] gcount 2, strlen 2 (EOF)
As you can see, there is a discrepancy between gcount and strlen on the last line of the output.
That said, let's come back to the main question now.
What is meant by the number of characters extracted in the documentation of std::ifstream::gcount?
The question has two parts to it.
- What is meant by a "character"?
- What is meant by "extraction"?
Is one character always one byte in this context? A unicode character could consist of multiple bytes. A line end sequence could consist of multiple bytes too (CR+LF). Could it ever happen (perhaps in the future) that gcount is increased by 1 but multiple bytes were extracted? Could it ever happen that gcount is increased by 1 but multiple bytes were stored in the array buffer?
Let's take the last line in your example and walk through it -
yy<eof>.At the point of hitting EOF, two characters have been extracted and so
gcountis 2.getlineis now going to append a null character to your buffer - this has nothing to do withgcount. Only two characters were actually extracted.In the case of a string with a delimiter, lets say
yy<lf><eof>:When the LF is hit, a character IS being extracted from the input, and so
gcountis incremented. However, that extracted character matches thegetlinedelimiter and so it is NOT added to your buffer. A null character gets added simply for null termination of the string.EOFis not a character that can be extracted and so reaching it does not incrementgcount.The only wording I can see on
cppreferencethat could maybe be disputed is this excerpt from https://en.cppreference.com/w/cpp/io/basic_istream/getline:You could maybe interpret this as the appending of the null character is why
gcountis being updated. However, I believe the intended meaning is thatgcountis being updated becausecount > 0.Regarding the question of how to determine the number of bytes written, the suggestion in the comments seems appropriate: