Returning an object: value, pointer and reference

2.1k Views Asked by At

I know this has probably been asked and I've looked through other answers, but still I cannot get this completely. I want to understand the difference between the two following codes:

MyClass getClass(){
return MyClass();
}

and

MyClass* returnClass(){
return new MyClass();
}

Now let's say I call such functions in a main:

MyClass what = getClass();
MyClass* who = returnClass();
  1. If I got this straight, in the first case the object created in the function scope will have automatic storage, i.e. when you exit the scope of the function its memory block will be freed. Also, before freeing such memory, the returned object will be copied into the "what" variable I created. So there will exist only one copy of the object. Am I correct?

    1a. If I'm correct, why is RVO (Return Value Optimization) needed?

  2. In the second case, the object will be allocated through a dynamic storage, i.e. it will exist even out of the function scope. So I need to use a deleteon it. The function returns a pointer to such object, so there's no copy made this time, and performing delete who will free the previously allocated memory. Am I (hopefully) correct?

  3. Also I understand I can do something like this:

    MyClass& getClass(){
    return MyClass();
    }
    

    and then in main:

     MyClass who = getClass();
    

    In this way I'm just telling that "who" is the same object as the one created in the function. Though, now we're out of the function scope and thus that object doesn't necessarily exists anymore. So I think this should be avoided in order to avoid trouble, right? (and the same goes for

    MyClass* who = &getClass();
    

    which would create a pointer to the local variable).

Bonus question: I assume that anything said till now is also true when returning vector<T>(say, for example, vector<double>), though I miss some pieces. I know that a vector is allocated in the stack while the things it contains are in the heap, but using vector<T>::clear() is enough to clear such memory. Now I want to follow the first procedure (i.e. return a vector by value): when the vector will be copied, also the onjects it contains will be copied; but exiting the function scope destroys the first object. Now I have the original objects that are contained nowhere, since their vector has been destroyed and I have no way of deleting such objects that are still in the heap. Or maybe a clear() is performed automatically?

I know that I may beatray some misunderstandings in these subjects (expecially in the vector part), so I hope you can help me clarify them.

2

There are 2 best solutions below

6
On BEST ANSWER

Q1. What happens conceptually is the following: you create an object of type MyClass on the stack in the stack frame of getClass. You then copy that object into the return value of the function, which is a bit of stack that was allocated before the function call to hold this object. Then the function returns, the temporary gets cleaned up. You copy the return value into the local variable what. So you have one allocation and two copies. Most (all?) compilers are smart enough to omit the first copy: the temporary is not used except as return value. However, the copy from the return value into the local variable on the caller side cannot be omitted, because the return value lives on a part of the stack that is freed as soon as the function finishes.

Q1a. Return Value Optimization (RVO) is a special feature, that does allow that final copy to be elided. That is, instead of returning the function result on the stack, it will be allocated straight away in the memory allocated for what, avoiding all copying altogether. Note that, contrary to all other compiler optimizations, RVO can change the behaviour of your program! You could give MyClass a non-default copy constructor, that has side effects, like printing a message to the console or liking a post on Facebook. Normally, the compiler is not allowed to remove such function calls unless it can prove that these side effects are absent. However, the C++ specs contain a special exception for RVO, that says that even if the copy constructor does something non-trivial, it is still allowed to omit the return value copy and reduce the whole thing to a single constructor call.

2. In the second case, the MyClass instance is not allocated on the stack, but on the heap. The result of the new operator is an integer: the address of the object on the heap. This is the only point where you will ever be able to obtain this address (provided you didn't use placement new), so you need to hold onto it: if you lose it, you cannot call delete and you will have created a memory leak. You assign the result of new to a variable whose type is denoted by MyClass* so that the compiler can do type checking and stuff, but in memory it is just an integer large enough to hold an address on your system (32- or 64-bits). You can check this for yourself by trying to coerce the result to a size_t (which is typedef'd to typically an unsigned int or something larger depending on your architecture) and seeing the conversion succeed. This integer is returned to the caller by value, i.e. on the stack, just as in example (1). So again, in principle, there is copying going on, but in this case only copying of a single integer which your CPU is very good at (most of the times it will not even go on the stack but get passed in a register) and not the whole MyClass object (which in general has to go on the stack because it's very large, read: larger than an integer).

3. Yes, you should not do that. Your analysis is correct: as the function finishes, the local object is cleaned up and its address becomes meaningless. The problem is, that it sometimes seems to work. Forgetting about optimizations for the time being, the main reason the way memory works: clearing (zero-ing) memory is quite expensive, so that is hardly ever done. Instead, it is just marked as available again, but it's not overwritten until you make another allocation that needs it. Therefore, even though the object is technically dead, its data may still be in the memory so when you dereference the pointer you may still get the right data back. However, since the memory is technically free, it may be overwritten at any time between right now and at the end of the universe. You have created what C++ calls Undefined Behaviour (UB): it may seem to work right now on your computer, but there's no telling what may happen somewhere else or at another point in time.

Bonus: When you return a vector by value, as you remarked, it is not just destroyed: it is first copied to the return value or - taking RVO into account - into the target variable. There are two options now: (1) The copy creates its own objects on the heap, and modifies its internal pointers accordingly. You now have two proper (deep) copies co-existing temporarily -- then when the temporary object goes out of scope, you are just left with the one valid vector. Or (2): When copying the vector, the new copy takes ownership of all the pointers that the old one holds. This is possible, if you know that the old vector is about to be destroyed: rather than re-allocating all the contents again on the heap, you can just move them to the new vector and leave the old one in a sort of half-dead state -- as soon as the function is done cleaning that stack the old vector is no longer there anyway. Which of these two options is used, is really irrelevant or rather, an implementation detail: they have the same result and whether the compiler is smart enough to choose (2) should not usually be your concern (though in practice option (2) will always happen: deep copying an object just to destroy the original is just pointless and easily avoided). As long as you realize that the thing that gets copied is the part on the stack and the ownership of the pointers on the heap gets transferred: no copying happens on the heap and nothing gets cleared.

0
On

Here are my answers to your different questions: 1- You are absolutely correct. If I understand the sequentiallity correctly, your code will allocate memory, create your object, copy the variable into the what variable, and get destroyed as out of scope. The same thing happens when you do:

int SomeFunction()
{
     return 10;
}

This will create a temporary that holds 10 (so allocate), copy it to the return vairbale, and then destroy the temporary (so deallocate) (Here I'm not sure of the specifics, maybe the compiler can remove some stuff via automatic inlining, constante values, ... but you get the idea). Which brings me to 1a- You need RVO when to limit this allocation, copy, and deallocation part. If your class allocates a lot of data upon construction it is a bad idea to return it directly. You can use move constructor in that case, and reuse the storage space allocated by the temporary for example. Or return a pointer. Which takes all the way down to

2- Returning a pointer works exactly as returning an int from a function. But because pointers are only 4 or 8 bytes long, allocation and deallocation cost a lot less than doing so for a class that's 10 Mb long. And instead of copying the object you copy its adress on the heap (usually less heavy, but copy nonetheless). Do not forget it is not because a pointer represents a memory that its size is 0 byte. So using a pointer requires getting the value from some memory address. Returning a reference and inlining are also good ideas to optimise your code, as you avoid chasing pointer, function calls, etc.

3- I think you are correct there. I'd have to make sure by testing, but if follow my logic you are right.

I hope I answered your questions. And I hope my answers are as correct as can be. But maybe someone more clever than me can correct me :-)

Best.