How to store bool result of a CUDA kernel function

1.6k Views Asked by At

Assume that we have 2^10 CUDA cores and 2^20 data points. I want a kernel that will process these points and will provide true/false for each of them. So I will have 2^20 bits. Example:

bool f(x) { return x % 2? true : false; }

void kernel(int* input, byte* output)
{
   tidx = thread.x ...
   output[tidx] = f(input[tidx]);

   ...or...

   sharedarr[tidx] = f(input[tidx]);
   sync()
   output[blockidx] = reduce(sharedarr);

   ...or...

   atomic_result |= f(input[tidx]) << tidx;
   sync(..)
   output[blckidx] = atomic_result;
}

Thrust/CUDA has some algorithms as "partitioning", "transformation" which provides similar alternatives.

My question is, when I write the relevant CUDA kernel with a predicate that is providing the corresponding bool result,

  • should I use one byte for each result and directly store the result in the output array? Performing one step for calculation and performing another step for reduction/partitioning later.

  • should I compact the output in the shared memory, using one byte for 8 threads and then at the end write the result from shared memory to output array?

  • should I use atomic variables?

What's the best way to write such a kernel and the most logical data structure to keep the results? Is it better to use more memory and simply do more writes to main memory instead of trying to deal with compacting the result before writing back to result memory area?

1

There are 1 best solutions below

0
On

There is no tradeoff between speed and data size when using the __ballot() warp-voting intrinsic to efficiently pack the results.

Assuming that you can redefine output to be of uint32_t type, and that your block size is a multiple of the warp size (32), you can simply store the packed output using

output[tidx / warpSize] = __ballot(f(input[tidx]));

Note this makes all threads of the warp try to store the result of __ballot(). Only one thread of the warp will succeed, but as their results are all identical, it does not matter which one will.