Let's say I have this very simple code:
std::vector<int> getVector( int size )
{
std::vector<int> res;
for ( size_t i = 0; i != size; ++i )
res.push_back( i* 23 );
return res;
}
int main ()
{
std::vector<int> v;
v = getVector(10);
std::cout << "Size1 is " << v.size() << std::endl;
v = getVector(15);
std::cout << "Size2 is " << v.size() << std::endl;
}
It outputs:
Size1 is 10
Size2 is 15
Now, let's say I want to change getVector
to avoid useless object copies and optimize speed and memory usage. So getVector
will now get a reference to the vector to be filled.
void getVector( int size, std::vector<int>& res )
{
for ( size_t i = 0; i != size; ++i )
res.push_back( i* 23 );
}
Now, if I change the main function, without paying attention:
int main ()
{
std::vector<int> v;
getVector(10,v);
std::cout << "Size1 is " << v.size() << std::endl;
getVector(15,v);
std::cout << "Size2 is " << v.size() << std::endl;
}
It outputs:
Size1 is 10
Size2 is 25
Which is not what I originally had.
I've done this kind of change (replaced returned value by object passed at reference) tones of times, and I've always been asking myself the same question: who should clear the vector?
Is there any "guidline" or "general rule" saying who should clear the vector in this case?
Should the function itself do it?
void getVector( int size, std::vector<int>& res )
{
res.clear();
for ( size_t i = 0; i != size; ++i )
res.push_back( i* 23 );
}
Or should the caller do it?
int main ()
{
std::vector<int> v;
getVector(10,v);
std::cout << "Size1 is " << v.size() << std::endl;
v.clear();
getVector(15,v);
std::cout << "Size2 is " << v.size() << std::endl;
}
Edit:
Example may be mischosen because it's a "bad optimization", as reported by many persons. But there may be cases where a vector is retrieved by reference and where question remains. For example:
bool hasData( std::vector<int>& retrievedData );
or
void splitVector( const std::vector<int>& originalVector,
std::vector<int>& part1,
std::vector<int>& part2 );
...
In you first example the function is called "getVector" which is appropriate to describe what the function does. It returns a vector with a certain number of elements, as expected.
In the second example the same does not apply. The function is still called "getVector" but now you want to clear the vector. It does not describe the function anymore because you're passing a vector to it and doing things with it. It's more like "takeAVectorClearItAndInitializeIt". And you probably don't want to name a function like that or use it in your code. (Naturally I'm exaggerating a little bit, but you get the idea)
To make a function easy to understand it should have one clear function and not do hidden things in the background (like clearing the client's vector). If I were to use your function I'd probably be confused with why my vector just got cleared.
Usually it's the client code that is responsible for managing its own variables. Even if the client code calls a function to clear the vector the responsibility is still in the client code for the intention of clearing its vector. In your case, you know from the client code what the "getVector" function does because you made them both so it does not confuse you, but when you write code that needs only one of the two functions (clear OR initialize), you may do some changes that affect all the code that used this function and create more problems.
On to answering your problem more specifically.
If you don't want to copy the vector passing a reference is fine but name the function something more descriptive like "initializeVector" so that it's more clear what you're doing.
You should usually call 'clear' from the client code though, so your first example is better until you have performance issues. (The clear is implicit as you're getting a different vector)