I'm using the advice given here for choosing an optimal GPU for my algorithm. https://stackoverflow.com/a/33488953/5371117
I query the devices on my MacBook Pro using boost::compute::system::devices(); which returns me following list of devices.
Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
Intel(R) UHD Graphics 630
AMD Radeon Pro 560X Compute Engine
I want to use AMD Radeon Pro 560X Compute Engine for my purpose but when I iterate to find the device with maximum rating = CL_DEVICE_MAX_CLOCK_FREQUENCY * CL_DEVICE_MAX_COMPUTE_UNITS. I get the following results:
Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz,
freq: 2600, compute units: 12, rating:31200
Intel(R) UHD Graphics 630,
freq: 1150, units: 24, rating:27600
AMD Radeon Pro 560X Compute Engine,
freq: 300, units: 16, rating:4800
AMD GPU has the lowest rating. Also I looked into the specs and it seems to me that CL_DEVICE_MAX_CLOCK_FREQUENCY isn't returning correct value.
According to AMD Chip specs https://www.amd.com/en/products/graphics/radeon-rx-560x, my AMD GPU has base frequency of 1175 MHz, not 300MHz.
According to Intel Chip specs https://en.wikichip.org/wiki/intel/uhd_graphics/630, my Intel GPU has base frequency of 300 MHz, not 1150MHz, but it does have a boost frequency of 1150MHz
std::vector<boost::compute::device> devices = boost::compute::system::devices();
std::pair<boost::compute::device, ai::int64> suitableDevice{};
for(auto& device: devices)
{
auto rating = device.clock_frequency() * device.compute_units();
std::cout << device.name() << ", freq: " << device.clock_frequency() << ", units: " << device.compute_units() << ", rating:" << rating << std::endl;
if(suitableDevice.second < benchmark)
{
suitableDevice.first = device;
suitableDevice.second = benchmark;
}
}
Am I doing anything wrong?
Those properties are unfortunately only really directly comparable within an implementation (same HW manufacturer, same OS).
My recommendation would be to:
CL_DEVICE_TYPE_GPU(unless there aren't any GPUs available, in which case you may want to fall back to CPU).CL_DEVICE_HOST_UNIFIED_MEMORYproperty. These will be integrated GPUs, and these are usually slower than discrete ones, unless you are bound by data transfer speeds, in which case they might be faster. So you'll want to prefer one type over the other.