How to properly access mapped memory without undefined behavior in C++

1.9k Views Asked by At

I've been trying to figure out how to access a mapped buffer from C++17 without invoking undefined behavior. For this example, I'll use a buffer returned by Vulkan's vkMapMemory.

So, according to N4659 (the final C++17 working draft), section [intro.object] (emphasis added):

The constructs in a C++ program create, destroy, refer to, access, and manipulate objects. An object is created by a definition (6.1), by a new-expression (8.3.4), when implicitly changing the active member of a union (12.3), or when a temporary object is created (7.4, 15.2).

These are, apparently, the only valid ways to create a C++ object. So let's say we get a void* pointer to a mapped region of host-visible (and coherent) device memory (assuming, of course, that all the required arguments have valid values and the call succeeds, and the returned block of memory is of sufficient size and properly aligned):

void* ptr{};
vkMapMemory(device, memory, offset, size, flags, &ptr);
assert(ptr != nullptr);

Now, I wish to access this memory as a float array. The obvious thing to do would be to static_cast the pointer and go on my merry way as follows:

volatile float* float_array = static_cast<volatile float*>(ptr);

(The volatile is included since this is mapped as coherent memory, and thus may be written by the GPU at any point). However, a float array doesn't technically exist in that memory location, at least not in the sense of the quoted excerpt, and thus accessing the memory through such a pointer would be undefined behavior. Therefore, according to my understanding, I'm left with two options:

1. memcpy the data

It should always be possible to use a local buffer, cast it to std::byte* and memcpy the representation over to the mapped region. The GPU will interpret it as instructed in the shaders (in this case, as an array of 32-bit float) and thus problem solved. However, this requires extra memory and extra copies, so I would prefer to avoid this.

2. placement-new the array

It appears that section [new.delete.placement] doesn't impose any restrictions on how the placement address is obtained (it need not be a safely-derived pointer regardless of the implementation's pointer safety). It should, therefore, be possible to create a valid float array via placement-new as follows:

volatile float* float_array = new (ptr) volatile float[sizeInFloats];

The pointer float_array should now be safe to access (within the bounds of the array, or one-past).


So, my questions are the following:

  1. Is the simple static_cast indeed undefined behavior?
  2. Is this placement-new usage well-defined?
  3. Is this technique applicable to similar situations, such as accessing memory-mapped hardware?

As a side note, I've never had an issue by simply casting the returned pointer, I'm just trying to figure out what the proper way to do this would be, according to the letter of the standard.

3

There are 3 best solutions below

1
On BEST ANSWER

Short answer

As per the Standard, everything involving hardware-mapped memory is undefined behavior since that concept does not exist for the abstract machine. You should refer to your implementation manual.


Long answer

Even though hardware-mapped memory is undefined behavior by the Standard, we can imagine any sane implementation providing some obeys common rules. Some constructs are then more undefined behavior than others (whatever that means).

Is the simple static_cast indeed undefined behavior?

volatile float* float_array = static_cast<volatile float*>(ptr);

Yes, this is undefined behavior and have been discussed many times on StackOverflow.

Is this placement-new usage well-defined?

volatile float* float_array = new (ptr) volatile float[N];

No, even though this looks well defined, this is implementation dependent. As it happens, operator ::new[] is allowed to reserve some overhead1, 2, and you cannot know how much unless you check your toolchain documentation. As a consequence, ::new (dst) T[N] requires an unknown amount of memory greater or equal to N*sizeof T and any dst you allocate might be too small, involving buffer overflow.

How to proceed, then?

A solution would be to manually build a sequence of floats:

auto p = static_cast<volatile float*>(ptr);
for (std::size_t n = 0 ; n < N; ++n) {
    ::new (p+n) volatile float;
}

Or equivalently, relying on the Standard Library:

#include <memory>
auto p = static_cast<volatile float*>(ptr);
std::uninitialized_default_construct(p, p+N);

This constructs contiguously N uninitialized volatile float objects at the memory pointed to by ptr. This means you must initialize those before reading them; reading an uninitialized object is undefined behavior.

Is this technique applicable to similar situations, such as accessing memory-mapped hardware?

No, again this is really implementation-defined. We can only assume your implementation took reasonable choices, but you should check what its documentation says.

2
On

The C++ spec has no concept of mapped memory, so everything to do with it is undefined behavior as far as the C++ spec is concerned. So you need to look to the particular implementation (compiler and operating system) that you are using to see what is defined and what you can do safely.

On most systems, mapping will return memory that came from somewhere else, and may (or may not) have been initialized in a way that is compatible with some specific type. In general, if the memory was originally written as float values of the correct, supported form, then you can safely cast the pointer to a float * and access it that way. But you do need to know how the memory being mapped was originally written.

1
On

C++ is compatible with C, and manipulating raw memory is kind-of-what C was perfect for. So don't worry, C++ is perfectly capable of doing what you want.

  • Edit:- follow this link for the simple answer for C/C++ compatibilty. -

In your example, you do not need to call new at all! To explain...

Not all objects in C++ require construction. These are known as PoD (plain-old-data) types. They are

1) Basic types (floats/ints/enums,etc).
2) All Pointers, but not smart pointers. 3) Arrays of PoD types.
4) Structures that contain only basic types, or other PoD types.
...
5) A class too can be a PoD-type, but the convention is that anything declared "class" should never be relied on to be PoD.

You can test whether a type is PoD using a standard function library object.

Now the only thing that is undefined about casting a pointer to a PoD-type, is that the contents of the structure are not set by anything, so you should treat them as "write-only" values. In your case you may have written to them from the "device" and so initializing them will destroy those values. (The correct cast is a "reinterpret_cast" by the way)

You are right to worry about alignment issues, but you are wrong to think this is something the C++ code can fix. Alignment is a property of the memory, not a language feature. To align the memory, you must make sure the "offset" is always a multiple of the "alignas" of your structure. On x64/x86, getting this wrong will not create any problems, only slow down the access to your memory. On other systems it can cause a fatal exception.
On the other hand, your memory is not "volatile", it is accessed by another thread. This thread may be on another device, but it is another thread. You need to use thread-safe memory. In C++, this is provided by atomic variables. However, an "atomic" is not a PoD object! You should use a memory fence instead. These primitives force the memory to be read from and to memory. The volatile keyword does this too, but the compiler is allowed to reorder volatile writes and this may cause unexpected results.

Finally, if you want your code to be "modern C++" style, you should do the following.
1) Declare your custom PoD structure to represent your data layout. You can use static_assert(std::is_pod<MyType>::value). This will warn you if the structure is not compatible.
2) Declare a pointer to your type. (In this case only, do not use a smart pointer, unless there is a way to "free" the memory which makes sense)
3) Only allocate memory via a call which returns this pointer type. This function needs to
a) Initialize your pointer type with the result of your call to the Vulkan API.
b) Use an in-place new on the pointer - this is not required if you only write to the data - but is good practice. If you want to default values, initialize them in your structure declaration. If you want to keep the values, simply don't give them default values, and the in-place new will do nothing.

Use an "acquire" fence before reading the memory, a "release" fence after writing. Vulcan may provide a specific mechanism for this, I don't know. It is normal though for all synchronization primitives (such as mutex lock/unlock) to imply a memory fence, so you may get away without this step.