Pointer/integer arithmetic (un)defined behaviour

526 Views Asked by At

I have the following function template:

template <class MostDerived, class HeldAs>
HeldAs* duplicate(MostDerived *original, HeldAs *held)
{
  // error checking omitted for brevity
  MostDerived *copy = new MostDerived(*original);
  std::uintptr_t distance = reinterpret_cast<std::uintptr_t>(held) - reinterpret_cast<std::uintptr_t>(original);
  HeldAs *copyHeld = reinterpret_cast<HeldAs*>(reinterpret_cast<std::uintptr_t>(copy) + distance);
  return copyHeld;
}

The purpose is to duplicate an object of a particular type and return it "held" by the same subobject as the input. Note that in principle, HeldAs can be an ambiguous or inaccessible base class of MostDerived, so no cast can help here.

This is my code, but it can be used with types outside my control (i.e. I cannot modify MostDerived or HeldAs). The function has the following preconditions:

  • *original is of dynamic type MostDerived
  • HeldAs is MostDerived or a direct or indirect base class of MostDerived (ignoring cv-qualifiation)
  • *held refers to *original or one of its base class subobjects.

Let's assume the preconditions are satisifed. Does duplicate have defined behaviour in such case?

C++11 [expr.reinterpret.cast] says (bold emphasis mine):

4 A pointer can be explicitly converted to any integral type large enough to hold it. The mapping function is implementation-defined. [ Note: It is intended to be unsurprising to those who know the addressing structure of the underlying machine. —end note ] ...

5 A value of integral type or enumeration type can be explicitly converted to a pointer. A pointer converted to an integer of sufficient size (if any such exists on the implementation) and back to the same pointer type will have its original value; mappings between pointers and integers are otherwise implementation-defined. [ Note: Except as described in 3.7.4.3, the result of such a conversion will not be a safely-derived pointer value. —end note ]

OK, let's say my compiler is GCC (or Clang, since that uses GCC's definitions of implementation-defined behaviour). Quoting GCC docs chapter 5 on C++ implementation-defined behaviour:

... Some choices are documented in the corresponding document for the C language. See C Implementation. ...

On to chapter 4.7 (C implementation, arrays and pointers):

The result of converting a pointer to an integer or vice versa (C90 6.3.4, C99 and C11 6.3.2.3).

A cast from pointer to integer discards most-significant bits if the pointer representation is larger than the integer type, sign-extends if the pointer representation is smaller than the integer type, otherwise the bits are unchanged.

A cast from integer to pointer discards most-significant bits if the pointer representation is smaller than the integer type, extends according to the signedness of the integer type if the pointer representation is larger than the integer type, otherwise the bits are unchanged.

So far, so good. It would seem that since I'm using std::uintptr_t which is guaranteed to be large enough for any pointer, and since I'm dealing with the same types, copyHeld should point to the same HeldAs subobject of *copy as held was pointing to within *original.

Unfortunately, there's one more paragraph in the GCC docs:

When casting from pointer to integer and back again, the resulting pointer must reference the same object as the original pointer, otherwise the behavior is undefined. That is, one may not use integer arithmetic to avoid the undefined behavior of pointer arithmetic as proscribed in C99 and C11 6.5.6/8.

Wham. So now it seems that even though the value of copyHeld is computed in accordance with the rules of the first two paragraphs, the third one still sends this into Undefined-Behaviour land.

I basically have three questions:

  1. Is my reading correct and the behavior of duplicate undefined?

  2. Which kind of Undefined Behaviour is this? The "formally undefined, but will do what you want anyway" kind, or the "expect random crashses and/or spontaneous self-immolation" one?

  3. If it's really Undefined, is there a way to do such a thing in a well-defined (possibly compiler-dependent) way?

While my question is limited to GCC (and Clang) behaviour as far as compilers are concerned, I'd welcome an answer which considers all kinds of HW platforms, from common desktops to exotic ones.

1

There are 1 best solutions below

5
On

The usual pattern for this is to put a clone() in the base class.
Then each derived class can implements its own version of clone.

class Base
{
     public:
        virtual Base*  clone() = 0;
};

class D: public Base
{
        virtual Base*  clone(){  return new D(*this);}
};