I have a kernel that takes no input and whose work items don't communicate with each other. Each work item operates on a different argument based on its global_id
, but this is not passed in. I want each work item to process its task, screen the result based on some criteria, and write back the result into a global memory array if it meets this criteria. What is the best way to do this? I considered a __global
index that would start at 0
and increment on each write, but there is no lock on this access and the parallel processes end up in a bunch of race conditions, so I don't know where to tell each work item to write to in the output array.
If this were a higher level language, I would expect to be able to pass in a shared hash or something and just push the successful outputs onto it, key'd by global_id
, but I'm having trouble figuring out what the most appropriate way to do this is in OpenCL land. Any thoughts? I am using vanilla C, not C++.
This looks like exactly what I needed, I just lacked the googlefu to get to it!
Please respond if you have any other suggestions on best practices, but for future reference, the above coupled with a
__global
memory buffer will fulfill my needs.