I've read various descriptions of std::string::c_str
including questions raised on SO over the years/decades,
I like this description for its clarity:
Returns a pointer to an array that contains a null-terminated sequence of characters (i.e., a C-string) representing the current value of the string object. This array includes the same sequence of characters that make up the value of the string object plus an additional terminating null-character ('\0') at the end.
However some things about the purpose of this function are still unclear.
You could be forgiven for thinking that calling c_str
might add a \0
character to the end of the string which is stored in the internal char array of the host object (std::string
):
s[s.size+1] = '\0'
But it seems std::string
objects are Null terminated by default even before calling c_str
:
After looking through the definition:
const _Elem *c_str() const _NOEXCEPT
{ // return pointer to null-terminated nonmutable array
return (this->_Myptr());
}
I don't see any code which would add \0
to the end of a char array. As far as I can tell c_str
just returns a pointer to the char stored in the first element of the array pretty much like begin()
does. I don't even see code which checks that the internal array is terminated by \0
Or am I missing something?
Before C++11, there was no requirement that a
std::string
(or the templated classstd::basic_string
- of which std::string is an instantiation) store a trailing'\0'
. This was reflected in different specifications of thedata()
andc_str()
member functions -data()
returns a pointer to the underlying data (which was not required to be terminated with a'\0'
andc_str()
returned a copy with a terminating'\0'
. However, equally, there was no requirement to NOT store a trailing'\0'
internally (accessing characters past the end of the stored data was undefined behaviour) ..... and, for simplicity, some implementations chose to append a trailing'\0'
anyway.With C++11, this changed. Essentially, the
data()
member function was specified as giving the same effect asc_str()
(i.e. the returned pointer is to the first character of an array that has a trailing'\0'
). That has a consequence of requiring the trailing'\0'
on the array returned bydata()
, and therefore on the internal representation.So the behaviour you're seeing is consistent with C++11 - one of the invariants of the class is a trailing
'\0'
(i.e. constructors ensure that is the case, member functions which modify the string ensure it remains true, and all public member functions can rely on it being true).The behaviour you're seeing is not inconsistent with C++ standards before C++11. Strictly speaking,
std::string
before C++11 was not required to maintain a trailing'\0'
but, equally, an implementer could choose to do so.