How to force return value optimization in msvc

2k Views Asked by At

I have a function in a class that I want the compiler to use NRVO on...all the time...even in debug mode. Is there a pragma for this?

Here is my class that works great in "release" mode:

template <int _cbStack> class CBuffer {
public:
    CBuffer(int cb) : m_p(0) { 
        m_p = (cb > _cbStack) ? (char*)malloc(cb) : m_pBuf;
    }
    template <typename T> operator T () const { 
        return static_cast<T>(m_p); 
    }
    ~CBuffer() { 
        if (m_p && m_p != m_pBuf) 
            free(m_p); 
    }
private: 
    char *m_p, m_pBuf[_cbStack];
};

The class is used to make a buffer on the stack unless more than _cbStack bytes are required. Then when it destructs, it frees memory if it allocated any. It's handy when interfacing to c functions that require a string buffer, and you are not sure of the maximum size.

Anyway, I was trying to write a function that could return CBuffer, like in this test:

#include "stdafx.h"
#include <malloc.h>
#include <string.h>

template <int _cbStack> CBuffer<_cbStack> foo() 
{ 
    // return a Buf populated with something...
    unsigned long cch = 500;
    CBuffer<_cbStack> Buf(cch + 1);
    memset(Buf, 'a', cch);  
    ((char*)Buf)[cch] = 0;
    return Buf;
}

int _tmain(int argc, _TCHAR* argv[])
{
    auto Buf = foo<256>();
    return 0;
}

I was counting on NRVO to make foo() fast. In release mode, it works great. In debug mode, it obviously fails, because there is no copy constructor in my class. I don't want a copy constructor, since CBuffer will be used by developers who like to copy everything 50 times. (Rant: these guys were using a dynamic array class to create a buffer of 20 chars to pass to WideCharToMultiByte(), because they seem to have forgotten that you can just allocate an array of chars on the stack. I don't know if they even know what the stack is...)

I don't really want to code up the copy constructor just so the code works in debug mode! It gets huge and complicated:

template <int _cbStack> 
class CBuffer {
public:
    CBuffer(int cb) : m_p(0) { Allocate(cb); }
    CBuffer(CBuffer<_cbStack> &r) { 
        int cb = (r.m_p == r.m_pBuf) ? _cbStack : ((int*)r.m_p)[-1];
        Allocate(cb);
        memcpy(m_p, r.m_p, cb);
    }
    CBuffer(CBuffer<_cbStack> &&r) { 
        if (r.m_p == r.m_pBuf) {
            m_p = m_pBuf;
            memcpy(m_p, r.m_p, _cbStack);
        } else {
            m_p = r.m_p;
            r.m_p = NULL;
        }
    }
    template <typename T> operator T () const {
        return static_cast<T>(m_p); 
    }
    ~CBuffer() {
        if (m_p && m_p != m_pBuf) 
            free((int*)m_p - 1); 
    }
protected: 
    void Allocate(int cb) {
        if (cb > _cbStack) {
            m_p = (char*)malloc(cb + sizeof(int));
            *(int*)m_p = cb;
            m_p += sizeof(int);
        } else {
            m_p = m_pBuf; 
        }
    }
    char *m_p, m_pBuf[_cbStack];
};

This pragma does not work:

 #pragma optimize("gf", on)

Any ideas?

3

There are 3 best solutions below

1
On

I don't think there is a publicly available fine-grained compiler option that only triggers NRVO.

However, you can still manipulate compiler optimization flags per each source file via either changing options in Project settings, command line, and #pragma.

http://msdn.microsoft.com/en-us/library/chh3fb0k(v=vs.110).aspx

Try to give /O1 or /O2 to the file that you want.

And, the debug mode in Visual C++ is nothing but a configuration with no optimizations and generating debugging information (PDB, program database file).

3
On

If you are using Visual C++ 2010 or later, you can use move semantics to achieve an equivalent result. See How to: Write a Move Constructor.

6
On

It is not hard to make your code both standards conforming and work.

First, wrap arrays of T with optional extra padding. Now you know the layout.

For ownership use a unique ptr instead of a raw one. If it is vapid, operator T* returns it, otherwise buffer. Now your default move ctor works, as does NRVO if the move fails.

If you want to support non POD types, a bit of work will let you both suppoort ctors and dtors and move of array elements and padding bit for bit.

The result will be a class that does not behave surprisingly and will not create bugs the first time someome tries to copy or move it - well not the first, that would be easy. The code as written will blow up in different ways at different times!

Obey the rule of three.

Here is an explicit example (now that I'm off my phone):

template <size_t T, size_t bufSize=sizeof(T)>
struct CBuffer {
  typedef T value_type;
  CBuffer();

  explicit CBuffer(size_t count=1, size_t extra=0) {
    reset(count, extra);
  }
  void resize(size_t count, size_t extra=0) {
    size_t amount = sizeof(value_type)*count + extra;
    if (amount > bufSize) {
      m_heapBuffer.reset( new char[amount] );
    } else {
      m_heapBuffer.reset();
    }
  }
  explicit operator value_type const* () const { 
    return get();
  }
  explicit operator value_type* () { 
    return get();
  }
  T* get() {
    return reinterpret_cast<value_type*>(getPtr())
  }
  T const* get() const {
    return reinterpret_cast<value_type const*>(getPtr())
  }
private: 
  std::unique_ptr< char[] > m_heapBuffer;
  char m_Buffer[bufSize];
  char const* getPtr() const {
    if (m_heapBuffer)
      return m_heapBuffer.get();
    return &m_Buffer[0];
  }
  char* getPtr() {
    if (m_heapBuffer)
      return m_heapBuffer.get();
    return &m_Buffer[0];
  }
};    

The above CBuffer supports move construction and move assignment, but not copy construction or copy assignment. This means you can return a local instance of these from a function. RVO may occur, but if it doesn't the above code is still safe and legal (assuming T is POD).

Before putting it into production myself, I would add some T must be POD asserts to the above, or handle non-POD T.

As an example of use:

#include <iostream>
size_t fill_buff(size_t len, char* buff) {
  char const* src = "This is a string";
  size_t needed = strlen(src)+1;
  if (len < needed)
    return needed;
  strcpy( buff, src );
  return needed;
}
void test1() {
  size_t amt = fill_buff(0,0);
  CBuffer<char, 100> strBuf(amt);
  fill_buff( amt, strBuf.get() );
  std::cout << strBuf.get() << "\n";
}

And, for the (hopefully) NRVO'd case:

template<size_t n>
CBuffer<char, n> test2() {
  CBuffer<char, n> strBuf;
  size_t amt = fill_buff(0,0);
  strBuf.resize(amt);
  fill_buff( amt, strBuf.get() );
  return strBuf;
}

which, if NRVO occurs (as it should), won't need a move -- and if NRVO doesn't occur, the implicit move that occurs is logically equivalent to not doing the move.

The point is that NRVO isn't relied upon to have well defined behavior. However, NRVO is almost certainly going to occur, and when it does occur it does something logically equivalent to doing the move-constructor option.

I didn't have to write such a move-constructor, because unique_ptr is move-constructable, as are arrays inside structs. Also note that copy-construction is blocked, because unique_ptr cannot be copy-constructed: this aligns with your needs.

In debug, it is quite possibly true that you'll end up doing a move-construct. But there shouldn't be any harm in that.