I have a std::vector
of a custom struct which contains two integers (and only two integers):
struct S {
int p0;
int p1;
};
std::vector<S> v(dimension);
I want a pointer to this array of struct, to be interpreted as an array of int with twice the original dimension.
The context is the following one: the std::vector
of struct (here, v
) is constructed in some legacy code. In my own code, this array has to be communicated to a GPU (through cudaMemcpy
).
The first solution would be to re-generate a new array of int
with twice the dimension of the original array. Then communication of the new array of int
to the GPU goes straightforwardly.
However, I want to avoid making this copy on the CPU memory, to save time and memory usage.
Instead, I hope to safely get a pointer to the first element of the vector of structs. Can I be sure that this is safe ? Can I be sure that the p0
, p1
fields are aligned on the memory as (p0, p1, p0, p1, ...)
when reading through v
? In other words, can I be sure that the *(ptr+2*i)
and *(ptr+2*i+1)
will be the p0
and p1
integers, respectively, of the i
-th element of the original array v
, when ptr
points to the tip of this original array?
Below is a minimalistic self-contained illustrative example. From this piece of code, it seems that everything goes well. Will this be always the case?
/*
Compilation: nvcc main.cu -o main.cuda
or g++ main.cpp -o main when removing cudaMemcpy's and cudaFree
*/
#include <vector>
#include <cassert>
#include <cuda_runtime.h>
struct S {
int p0;
int p1;
};
int main()
{
assert(sizeof(S)==2*sizeof(int));
unsigned int n = 10;
unsigned int dimension = (1UL)<<n; // =2^n
std::vector<S> v(dimension);
// initialize array of struct - done in legacy code
int shift = 10;
for (int i=0; i<dimension; ++i) {
v[i].p0 = shift+2*i;
v[i].p1 = shift+2*i+1;
}
int *d_v;
cudaMalloc((void**)&d_v, 2*dimension*sizeof(int));
// solution 1 - to be avoided - re-allocate the array on CPU memory and copy values to GPU
std::vector<int> v2(2*dimension);
for (int i=0; i<dimension; ++i) {
v2[2*i] = v[i].p0;
v2[2*i+1] = v[i].p1;
}
cudaMemcpy(d_v, v2.data(), 2*dimension, cudaMemcpyHostToDevice);
// solution 2 (?)
int * ptr = (int*)v.data(); // safe ?
// ptr should point to the "p0" field of the first element of v. Let's check:
for (int i=0; i<2*dimension; ++i) {
assert( *(ptr+i) == shift+i ); // Ok. Is it always true ?
}
cudaMemcpy(d_v, ptr, 2*dimension, cudaMemcpyHostToDevice); // Is it fully safe ?
/* ... use d_v in Cuda kernels */
cudaFree(d_v);
return 0;
}
You asked:
Strictly speaking, I am unable to find anything in the language which guarantees that.
If you are able to verify that
sizeof(S)
is equal to2*sizeof(int)
, I don't see any reason why that would not be true.i.e., inserting the following line in your code should be a sufficient guarantee that your code does not use memory inappropriately.
On a separate note, I would change
to