I have a CUDA kernel that takes a list of structs.
kernel<<<blockCount,blockSize>>>(MyStruct *structs);
Each struct contains 3 pointers.
typedef struct __align(16)__ {
float* pointer1;
float* pointer2;
float* pointer3;
}
I have three device arrays containing floats and each pointer within the struct points to a float within one of the three device array.
The list of structs represents a tree/graph structure which allows the kernel to execute recursive operations, depending on the order of the list of structs that is sent to the kernel. (This bit works in C++ so is not associated to my problem)
What I would like to do is be able to send my struct of pointers from JCuda. I understand that this isn't natively possible unless it is flattened to a padded array as in this post.
I understand all the issues with alignment and padding that may happen when sending a list of structs, it's essentially a repeating padded array which I am fine with.
The bit I am not sure how to do, is populate my flattened struct buffer with pointers, for example, I would think i can do something like this:
Pointer A = ....(underlying device array1)
Pointer B = ....(underlying device array2)
Pointer C = ....(underlying device array3)
ByteBuffer structListBuffer = ByteBuffer.allocate(16*noSteps);
for(int x = 0; x<noSteps; x++) {
// Get the underlying pointer values
long pointer1 = A.withByteOffset(getStepOffsetA(x)).someGetUnderlyingPointerValueFunction();
long pointer2 = B.withByteOffset(getStepOffsetB(x)).someGetUnderlyingPointerValueFunction();
long pointer3 = C.withByteOffset(getStepOffsetC(x)).someGetUnderlyingPointerValueFunction();
// Build the struct
structListBuffer.asLongBuffer().append(pointer1);
structListBuffer.asLongBuffer().append(pointer2);
structListBuffer.asLongBuffer().append(pointer3);
structListBuffer.asLongBuffer().append(0); //padding
}
structListBuffer
would then contain a list of structs in the way that the kernel would expect it.
So is there any way to do the someGetUnderlyingPointerValueFunction()
from a ByteBuffer?
If I understood everything correctly, the main point of the question is whether there is such a magic function like
that returns the address of the native pointer.
The short answer: No, there is no such function.
(Side note: A similar functionality was already requested in quite a while ago, but I have not yet added it. Mainly because such a function does not make sense for pointers to Java arrays or (non-direct) byte buffers. Additionally, manually handling structs with their paddings and alignments, and pointers with different sizes on 32 and 64 bit machines, and buffers that are big- or little endian is an endless source of headaches. But I see the point, and the possible application case, and so I'll most likely add something like a
getAddress()
function. Maybe only to theCUdeviceptr
class, where it definitely makes sense - at least more than in thePointer
class. People will use this method to do odd things, and they will do things that will cause nasty crashes of the VM, but JCuda itself is such a thin abstraction layer that there is no safety net in this regard anyhow...)That said, you can work around the current limitation, with a method like this:
Of course, this is ugly and clearly contradicts the intention of making the
getNativePointer()
andgetByteOffset()
methodsprotected
. But it might eventually be replaced with some "official" method:and until now, this is probably the solution that is closest to what you can do on the C side.
Here is an example that I wrote for testing this. The kernel is only a dummy kernel, that fills the structure with "identifiable" values (to see whether they end up in the right place), and is supposed to be launched with 1 thread only:
This kernel is launched in the following program (Note: The compilation of the PTX file is done here on the fly, with settings that may not match your application case. In doubt, you may compile your PTX file manually).
The
pointer1
,pointer2
andpointer3
pointers of each struct are initialized so that they point to consecutive elements of the device buffersA
,B
andC
, respectively, each with an offset that allows identifying the values that are written by the kernel. (Note that I tried to handle the two possible cases of running this either on a 32bit- or a 64bit machine, which implies different pointer sizese - although, currently, I can only test the 32bit version)The result is, as expected/desired: