Zero copy buffer allocation on arm mali midgard gpus?

437 Views Asked by At

I wish to have zero copy behaviour for opencl buffers on arm mali midgard gpus and arm cpus such that a vector's data pointer and a clBuffer points to the same location for their lifetime.

Some of the things which I tried. I wrote a custom allocator (64 byte alignement) for a vector and then I tried to use cl_arm_import_memory function and pass the vector's pointer to the function. But the issue is when I query the device EXT properties, I just see the cl_arm_import_memory string and not the cl_arm_import_memory_host string.

I have also tried to first allocate a gpu side buffer and then force a vector to point to the buffer's location. But according to the Mali guide , a gpu side buffer's location might change such that it might point to separate addresses during multiple mappings.

So, my question is what is the best way to achieve zero copy behaviour between a std::vector and and OpenCL buffer.

2

There are 2 best solutions below

0
On

I think you're mixing two unrelated concepts, zero copy and shared virtual memory. Zero copy does not guarantee that a piece of physical memory will be visible at the same address in both CPU and GPU - they can be mapped differently in CPU's and GPU's virtual address space. If you want the physical memory to have the same virtual address in GPU and CPU, you need shared virtual memory (SVM). This requires OpenCL 2.x and allocating buffers through clSVMAlloc(). If your vendor doesn't provide OpenCL 2.x only 1.x then you're out of luck - you can have zero copy buffers, but not SVM.

0
On

Try this:

  1. Create Buffer with CL_MEM_ALLOC_HOST_PTR.
  2. Call clEnqueueMapBuffer to get a host side pointer.

Sample code:

deviceBuffer = clCreateBuffer(cl->context,
                          CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR,
                          sizeof(T) * dataLength,
                          nullptr,
                          &error); checkError(error);

hostPtr = (T *) clEnqueueMapBuffer(cl->memCmdQueue,
                               zeroCopyMem.deviceBuffer,
                               CL_TRUE,
                               CL_MAP_WRITE,
                               0,
                               sizeof(T) * dataLength,
                               0, NULL, NULL, &error);