Who should clear a vector retrieved by reference?

145 Views Asked by At

Let's say I have this very simple code:

  std::vector<int> getVector( int size )
  {
      std::vector<int> res;
      for ( size_t i = 0; i != size; ++i )
          res.push_back( i* 23 );
      return res;
  }
  
  int main ()
  {
      std::vector<int> v;
      v = getVector(10);
      std::cout << "Size1 is " << v.size() << std::endl;
      v = getVector(15);
      std::cout << "Size2 is " << v.size() << std::endl;
  }

It outputs:

Size1 is 10

Size2 is 15

Now, let's say I want to change getVector to avoid useless object copies and optimize speed and memory usage. So getVector will now get a reference to the vector to be filled.

  void getVector( int size, std::vector<int>& res )
  {
      for ( size_t i = 0; i != size; ++i )
          res.push_back( i* 23 );
  }

Now, if I change the main function, without paying attention:

  int main ()
  {
      std::vector<int> v;
      getVector(10,v);
      std::cout << "Size1 is " << v.size() << std::endl;
      getVector(15,v);
      std::cout << "Size2 is " << v.size() << std::endl;
  }

It outputs:

Size1 is 10

Size2 is 25

Which is not what I originally had.

I've done this kind of change (replaced returned value by object passed at reference) tones of times, and I've always been asking myself the same question: who should clear the vector?

Is there any "guidline" or "general rule" saying who should clear the vector in this case?

Should the function itself do it?

  void getVector( int size, std::vector<int>& res )
  {
      res.clear();
      for ( size_t i = 0; i != size; ++i )
          res.push_back( i* 23 );
  }

Or should the caller do it?

  int main ()
  {
      std::vector<int> v;
      getVector(10,v);
      std::cout << "Size1 is " << v.size() << std::endl;
      v.clear();
      getVector(15,v);
      std::cout << "Size2 is " << v.size() << std::endl;
  }

Edit:

Example may be mischosen because it's a "bad optimization", as reported by many persons. But there may be cases where a vector is retrieved by reference and where question remains. For example:

bool hasData( std::vector<int>& retrievedData );

or

void splitVector( const std::vector<int>& originalVector,
                  std::vector<int>& part1,
                  std::vector<int>& part2 );

...

4

There are 4 best solutions below

0
On BEST ANSWER

In you first example the function is called "getVector" which is appropriate to describe what the function does. It returns a vector with a certain number of elements, as expected.

In the second example the same does not apply. The function is still called "getVector" but now you want to clear the vector. It does not describe the function anymore because you're passing a vector to it and doing things with it. It's more like "takeAVectorClearItAndInitializeIt". And you probably don't want to name a function like that or use it in your code. (Naturally I'm exaggerating a little bit, but you get the idea)

To make a function easy to understand it should have one clear function and not do hidden things in the background (like clearing the client's vector). If I were to use your function I'd probably be confused with why my vector just got cleared.

Usually it's the client code that is responsible for managing its own variables. Even if the client code calls a function to clear the vector the responsibility is still in the client code for the intention of clearing its vector. In your case, you know from the client code what the "getVector" function does because you made them both so it does not confuse you, but when you write code that needs only one of the two functions (clear OR initialize), you may do some changes that affect all the code that used this function and create more problems.


On to answering your problem more specifically.

If you don't want to copy the vector passing a reference is fine but name the function something more descriptive like "initializeVector" so that it's more clear what you're doing.

You should usually call 'clear' from the client code though, so your first example is better until you have performance issues. (The clear is implicit as you're getting a different vector)

6
On

Now, let's say I want to change getVector to avoid useless object copies and optimize speed and memory usage. So getVector will now get a reference to the vector to be filled.

There is some things you should think about:

  • In your first example there is no copy because RVO/NRVO compiler optimization.
  • If you using C++11 all STL containers have move constructor. So, will be never copied in situations like this.
  • If you prefer to receive the reference, there is no other(correct) way than clear on caller's side.
  • You can optimize this snippet a bit using vector::reserve in the begining of getVector function.
0
On

Neither. Instead you should rethink your design decision. The action you performed to remove a (probably theoretical) performance problem - I read this from your wording

avoid useless object copies and optimize speed and memory usage

replaces two specific questions:

  1. What is performance bottleneck in this function?
  2. Should I avoid std::vector for performance reasons?

that are relatively easy to answer, by an abstract and hard one that has no true answer, but it causes severe headaches to you (and us): Your "optimization" actually causes a performance hit.

4
On

No single right . But think of the stringstream class object for reference . The one who intends to re-use the object of this class is required to do a clearup before reusing the object . The one who creates the object is generally responsible for deleting the same .