Why is assigning a container's element to the container (not) a well-defined C++?

132 Views Asked by At

In C++ there is the infamous problem of self-assignment: when implementing operator=(const T &other), one has to be careful of the this == &other case to not destroy this's data before copying it from other.

However, *this and other may interact in more interesting ways than being the same object. Namely, one may contain the other. Consider the following code:

#include <iostream>
#include <string>
#include <utility>
#include <vector>
struct Foo {
    std::string s = "hello world very long string";
    std::vector<Foo> children;
};
int main() {
    std::vector<Foo> f(4);
    f[0].children.resize(2);
    f = f[0].children;  // (1)
    // auto tmp = f[0].children; f = std::move(tmp);  // (2)
    std::cout << f.size() << "\n";
}

I'd expect that lines (1) and (2) are identical: program is well-defined to print 2. However, I'm yet to find a compiler+standard library combination that works with line (1) and Address Sanitizer enabled: GCC+stdlibc++, Clang+libc++ and Visual Studio+Microsoft STL all crash.

Curiously, disabling Address Sanitizer removes the crash and the program starts printing 2.

Why is this operation prohibited or permitted in the standard C++?

Extra question: same, but with f[0].children = f. Extra-extra question: use std::any instead of std::vector<Foo>.

1

There are 1 best solutions below

2
paddy On

I'm not convinced that (1) is well-defined, because in order to copy a new value into f[0], the old object residing at that location must first be destroyed, or is at the very least modified while under the contract of being const.

From std::vector<T,Allocator>::operator= (emphasis mine):

If the allocator of *this after assignment would compare unequal to its old value, the old allocator is used to deallocate the memory, then the new allocator is used to allocate it before copying the elements. Otherwise, the memory owned by *this may be reused when possible. In any case, the elements originally belonging to *this may be either destroyed or replaced by element-wise copy-assignment.

So it would be expected that in all scenarios above, it's possible the object is destroyed before it's be copied, and you fall into the territory of behavior that is either undefined or specific to an implementation.

In practical terms, for the vector to re-use this memory it generally necessitates placement-delete followed by placement-new and in these cases once again the referenced object being copied is destroyed in the process.

Even in the most lenient scenario (i.e. "replaced by element-wise copy-assignment") you begin with Foo::operator=(const Foo&) invoked on f[0] to replace it with a copy of f[0].children[0]. The vector f[0].children[0].children is empty, and so the copy will result in both elements of f[0].children being destroyed but leaving the target vector's capacity (which is 2) unchanged. Before even getting to the next element, the const Foo& that was originally being copied has been modified, breaking its contract and all bets are off.

I don't think there's any automatic way to protect against that without maybe using some kind of custom garbage-collecting allocator. You simply need to recognize the self-referential problem and avoid it. You worked around the problem in (2) by introducing a copy, and that is at least well-defined. It can be taken one step further by moving the data out of the container first:

auto tmp = std::move(f[0].children);
f = std::move(tmp);

Perhaps the problem can be more generally worked around with careful application of std::shared_ptr, since your main issue is the destruction of data that you expected is still referenced.

I think the whole contract-breaking-of-const-object stuff is really the key to answering your "extra" question about f[0].children = f without getting too deep in details. In this case, children may be reallocated due to the required increase in capacity, and in doing so modifies f which was supposed to be const.