Question related to std::string object and c_str() method in 3 different implementations

262 Views Asked by At

I saw a strange behavior the other day. So I wanted to store lines(present in a vector) in a char array and wanted to use '\n' as delimiter.

I know c_str() method in string class returns a pointer to a char array ending in '\0'.

Based on my experience/understanding of C++.(see greet0 and greet2 functions). I assumed it should work but it didn't.

Can anyone explain the different behavior in three greet functions? What is the the scope of the object mentioned in each of the greet function? (also i had a guess that the string object was destroyed in greet1 function but if that would have been the case there should be segmentation fault in cout<<"greet1:"<<w1<<endl; but that does not happen so what exactly is happening in background).

//The snippet that where i first encountered the issue. 
const char* concatinated_str(std::vector<std::string> lines, const char *delimiter)
{
        std::stringstream buf;
        std::copy(lines.begin(), lines.end(), std::ostream_iterator<std::string>(buf, delimiter));
        string w = buf.str();
        const char *ret = w.c_str();

        return ret;

}
//Implementation 0
string greet0(){
    string msg = "hello";
    return msg;
}

//Implementation 1
const char* greet1(){
    string msg = "hello";
    cout<<&msg<<endl;
    return msg.c_str();
}

//Implementation 2
const char* greet2(){
    const char* msg = "hello";
    return msg;
}


int main(){
    auto w0 = greet0();
    cout<<&w0<<endl;
    cout<<"greet0:"<<w0<<endl;

    auto w1 = greet1();
    cout<<"greet1:"<<w1<<endl;
    
    const char* w2 = greet2();
    cout<<"greet2:"<<w2<<endl;
}

Output:

0x7fff0ff3e8e0
0x7fff0ff3e8e0
greet0:hello
greet1:
greet2:hello
3

There are 3 best solutions below

0
On

Returning a std::string or the pointer to a string-literal by value is perfectly fine.
Using the return-value of greet1() though has Undefined Behavior because the std::string whose elements you try to print died at the end of its enclosing function, leaving the returned pointer dangling.

What happens if you dereference a dangling pointer is not defined, acting as if you had a pointer to an empty string due to storage being re-used being one of the more benign possibilities.

As an aside, the address of a std::string is rarely that interesting to someone executing your program, though printing it is perfectly fine.

1
On

In statements cout<<&w0<<endl; cout<<&msg<<endl; you're outputting a pointer to std::string. Remove the & to actually print string, not its address. IF you're mystified by same result for two different objects, that might be because of they are addresses of local variables. The memory could be reused as those objects are limited in their lifetime not necessary have unique locations.

In greet0 technically msg is a local variable and stops existing on exit from function but compiler may optimize returned value and instead of copying msg to outside, the actual code would form a proper object at destination w0. With newer compilers Returned Value Optimization is guaranteed.

In function

const char* greet1(){
    string msg = "hello";
    cout<<&msg<<endl;
    return msg.c_str();
}

msg here is a function-local variable, so it represents an object that stops existing at end of scope containing it, i.e. after function had returned. After return line the pointer taken from c_str() is dangling, because that method returns a pointer to the internal storage of std::string. The storage of msg was destroyed and you're invoking Undefined Behaviour by accessing it. Segmentation fault (which is purely Linux event by the way, mechanics in Windows are different) is possible outcome but not necessary.

In third function

const char* greet2(){
    const char* msg = "hello";
    return msg;
}

msg points to a array containing the constant string "hello". Constant strings created by string literals have same lifespan as a global static object. Those strings are formed during compilation. Exiting function doesn't invalidate the pointer, you still can dereference it because string still exists.

0
On

The only code that invokes undefined behavior is related to this function

#Implementation 1
const char* greet1(){
    string msg = "hello";
    cout<<&msg<<endl;
    return msg.c_str();
}

The local object msg of the type std::string will not be alive after exiting the function. It will be destroyed. So the function returns an invalid pointer.

In this function implementation

#Implementation 2
const char* greet2(){
    const char* msg = "hello";
    return msg;
}

there is returned a pointer to the first character of the string literal "hello" that has static storage duration. It means that the string literal will be alive after exiting the function. Thus the function returns a valid pointer.

This function

#Implementation 0
string greet0(){
    string msg = "hello";
    return msg;
}

returns a temporary object of the type std::string that is moved (possibly with the move elision) to the variable w0 in main

auto w0 = greet0();

So this function is correct.