Is the author's union-based implementation of an optional<bool> well-defined in P2641?

189 Views Asked by At

In P2641r4: Checking if a union alternative is active, the author provides an implementation of an optional<bool> as a motivating example and claims that this is well-formed.

struct OptBool {
  union { bool b; char c; };

  OptBool() : c(2) { }
  OptBool(bool b) : b(b) { }

  auto has_value() const -> bool {
    return c != 2;
  }

  auto operator*() -> bool& {
    return b;
  }
};

However, I am not convinced. Namely, has_value() doens't look to be safe because if a bool is the active union member, then c != 2 accesses an inactive member and performs union type punning. To my knowledge, this is not allowed in C++.

The author explains that it can't be done because an inactive union member is being read, and provides the following implementation:

  constexpr auto has_value() const -> bool {
    if consteval {
      return std::is_within_lifetime(&b);
    } else {
      return c != 2;
    }
  }

What did the author mean by this? Does this mean that you cannot read the inactive union member in a constant expression but it would otherwise be permitted? Is this code totally well-formed or does it rely on compiler extensions that would permit union type punning at run-time?


Note: This is a sister question to Is the author's implementation of an optional<bool> well-defined in P2641? which discusses the other implementation.

2

There are 2 best solutions below

0
On

I assume that operator* has a precondition as usual that the OptBool(bool b) overload has been used. Using operator* when the optional is empty is clearly UB, but also is not intended use.

When b is the active member, then accessing the c has undefined behavior because it must be out-of-lifetime.

The intent here is to look at the object representation, which can be achieved by adding a seemingly unnecessary cast:

return *reinterpret_cast<unsigned char*>(reinterpret_cast<OptBool*>(&c)) != 2;

The inner cast will yield a pointer to the OptBool object, because OptBool is standard-layout and pointer-interconvertible with the c subobject.

The outer cast will then produce a pointer to the OptBool object with expression type unsigned char*. Accessing through it is not an aliasing violation. However, it is currently not specified what value this access should read. The intention is for it to read the first byte of the object representation of the OptBool object (and also the bool or char object), but that isn't specified to happen at the moment. There is P1839 trying to fix that. It is in practice what everyone assumes as the behavior, even if the standard doesn't say that at the moment, which is a defect.

In any case, the implementation of course assumes a specific implementation of bool, specifically its size, alignment and object/value representations.

1
On

There is nothing in the standard requiring false or true to not be stored as the value 2.
int(false) is required to be 0 and int(true) is required to be 1 but those are results of casts, not the value stored in memory.
See https://stackoverflow.com/a/19351548/362589