Best method for GPU to CPU communication in OpenCL

347 Views Asked by At

I have a kernel that takes no input and whose work items don't communicate with each other. Each work item operates on a different argument based on its global_id, but this is not passed in. I want each work item to process its task, screen the result based on some criteria, and write back the result into a global memory array if it meets this criteria. What is the best way to do this? I considered a __global index that would start at 0 and increment on each write, but there is no lock on this access and the parallel processes end up in a bunch of race conditions, so I don't know where to tell each work item to write to in the output array.

If this were a higher level language, I would expect to be able to pass in a shared hash or something and just push the successful outputs onto it, key'd by global_id, but I'm having trouble figuring out what the most appropriate way to do this is in OpenCL land. Any thoughts? I am using vanilla C, not C++.

1

There are 1 best solutions below

2
On

This looks like exactly what I needed, I just lacked the googlefu to get to it!

Please respond if you have any other suggestions on best practices, but for future reference, the above coupled with a __global memory buffer will fulfill my needs.