Correctly reinterpreting array of struct of int as array of int

168 Views Asked by At

I have a std::vector of a custom struct which contains two integers (and only two integers):

struct S {
    int p0;
    int p1;
};
std::vector<S> v(dimension);

I want a pointer to this array of struct, to be interpreted as an array of int with twice the original dimension.
The context is the following one: the std::vector of struct (here, v) is constructed in some legacy code. In my own code, this array has to be communicated to a GPU (through cudaMemcpy).

The first solution would be to re-generate a new array of int with twice the dimension of the original array. Then communication of the new array of int to the GPU goes straightforwardly.

However, I want to avoid making this copy on the CPU memory, to save time and memory usage.
Instead, I hope to safely get a pointer to the first element of the vector of structs. Can I be sure that this is safe ? Can I be sure that the p0, p1 fields are aligned on the memory as (p0, p1, p0, p1, ...) when reading through v? In other words, can I be sure that the *(ptr+2*i) and *(ptr+2*i+1) will be the p0 and p1 integers, respectively, of the i-th element of the original array v, when ptr points to the tip of this original array?

Below is a minimalistic self-contained illustrative example. From this piece of code, it seems that everything goes well. Will this be always the case?

/*
Compilation: nvcc main.cu -o main.cuda
or g++ main.cpp -o main when removing cudaMemcpy's and cudaFree
*/
#include <vector>
#include <cassert>
#include <cuda_runtime.h>

struct S {
    int p0;
    int p1;
};

int main()
{
    assert(sizeof(S)==2*sizeof(int));

    unsigned int n = 10;
    unsigned int dimension = (1UL)<<n; // =2^n

    std::vector<S> v(dimension);
    // initialize array of struct - done in legacy code
    int shift = 10;
    for (int i=0; i<dimension; ++i) {
        v[i].p0 = shift+2*i;
        v[i].p1 = shift+2*i+1;
    }

    int *d_v;
    cudaMalloc((void**)&d_v, 2*dimension*sizeof(int));

    // solution 1 - to be avoided - re-allocate the array on CPU memory and copy values to GPU
    std::vector<int> v2(2*dimension);
    for (int i=0; i<dimension; ++i) {
        v2[2*i] = v[i].p0;
        v2[2*i+1] = v[i].p1;
    }
    cudaMemcpy(d_v, v2.data(), 2*dimension, cudaMemcpyHostToDevice);

    // solution 2 (?)
    int * ptr = (int*)v.data(); // safe ?
    // ptr should point to the "p0" field of the first element of v. Let's check:
    for (int i=0; i<2*dimension; ++i) {
        assert( *(ptr+i) == shift+i ); // Ok. Is it always true ?
    }
    cudaMemcpy(d_v, ptr, 2*dimension, cudaMemcpyHostToDevice); // Is it fully safe ?

    /* ... use d_v in Cuda kernels */

    cudaFree(d_v);

    return 0;
}
2

There are 2 best solutions below

4
On

You asked:

In other words, can I be sure that the *(ptr+2*i) and *(ptr+2*i+1) will be the p0 and p1 integers, respectively, of the i-th element of the original array v, when ptr points to the tip of this original array?

Strictly speaking, I am unable to find anything in the language which guarantees that.

If you are able to verify that sizeof(S) is equal to 2*sizeof(int), I don't see any reason why that would not be true.

i.e., inserting the following line in your code should be a sufficient guarantee that your code does not use memory inappropriately.

static_assert(sizeof(S) == 2*sizeof(int), "Objects are not aligned properly");

On a separate note, I would change

int* ptr = (int*)v.data();

to

int* ptr = &(v.data()[0].p0);
0
On

An alternative approach, which would keep you from having to language-lawyer your way around formal and actual bugs, could be the following:

  • Allocate the raw integers, e.g. using std::make_unique<int[]>(2*dimension), or even with std::make_unique_for_overwrite.
  • Use an std::span<int> when you want to refer to the entire double-length of raw int's.
  • Only ever pass the span around to functions which need to use it (and not resize it).
  • When you need a struct S, construct it on the spot from a pair of consecutive integers in the span. The construction will likely be optimized away by the compiler, i.e. if you have foo(const S my_s), and you invoke foo(S{my_ints[456], my_ints[457]}) - it's very possible that no actual construction will need to happen, and those int's will just be placed in CPU registers to be used there.

(If you'd like to see how this would be done in code, ask in a comment.)