The function signature of my kernel is as follows:
template< size_t S, typename Field, typename Type1, typename Type2>
void kernel(const Type1 arg1, const Type2 arg2, Field *results) {
// S is known at compile time
// Field might be float or double
// Type1 is an object holding data and also methods
// Type2 is an object holding data and also methods
// The computation start here
}
I know that is possible to use a subset of the features of c++ to write the kernel using an extension to the implementations of OpenCL from AMD but the resulting code is restricted to run on AMD cards only.
The standard specification of OpenCL language for versions previous to 2.0 constraint the programmer to use C99 for writing the kernels and I believe that versions 2.1 and 2.2 are not widely available for Linux distros yet. However, I found here that Boost::compute allows to some extent to use a subset of c++ features in the specification of the kernels. However is not clear if it is possible to implement a kernel signature as in the code snippet above using Boos::compute. To which extent is it possible to implement such a kernel? code examples will be very appreciated.
TL;DR: yes and no. It is indeed possible to some extent to write templated kernels, but those aren't nearly as powerful as their CUDA counterpart.
It isn't restricted to run on AMD cards only. It is restricted to be compiled on AMD's OpenCL implementation only. For example, it should run on Intel CPUs just fine, as long as it's compiled on AMD's implementation.
Boost.Compute is essentially a fancy abstraction layer above the OpenCL C API to make it more palatable and less tedious to work with, but it still gives you full access to the underlying C API. This means that if something is feasible from the C API, it should in theory also be feasible from Boost.Compute.
Since OpenCL code is compiled at runtime, in a separate pass, you won't be able to automatically do template instantiation the way CUDA does it at compile time. The CUDA compiler sees both host and device code and can do proper template instatiation across the entire call graph, as if it were a single translation unit. This is impossible in OpenCL, by design.
1. You will have to manually instantiate all the possible template instatiations you need, mangle their name, and dispatch to the proper instantiation.
2. All types used in template instantiations must be defined in OpenCL code too.
This restriction makes OpenCL templated kernels not entirely useless, but also not very practical compared to CUDA ones. Their main purpose is to avoid code duplication.
Another consequence of this design is that non-type template parameters aren't allowed in kernel templates template argument lists (at least as far as I know, but I would really like to be wrong on this one!). This means you'll have to lower the non-type template parameter of the kernel template into a non-type template parameter of one of the arguments. In other words, transform something that looks like this:
Into something like this:
And then distinguishing different instantiations by using something similar in spirit to
std::integral_constant<std::size_t, 512>(or any other type that can be templated on an integer constant) as first argument. The pointer here is just a trick to avoid requiring a host-side definition of the size type (because we don't care about it).Good luck and feel free to edit this post with your changes so that others may benefit from it!