Getting CL_INVALID_WORKGROUP_SIZE on matrix operations

169 Views Asked by At

I am passing in a matrix as global memory and processing each vector(row) in local memory. The actual matrix passed in is 100 X 2025, but in the kernel I pad it with zeros to utilize power of 2 operations. I process 4 elements of the vector in each work item.

MAX_WORK_ITEM_SIZES: (512,512,512) MAX_WORK_GROUP_SIZE: 512

size_t globalWorkSize[2] = { 100, 2048 };
size_t localWorkSize[1] = { 512 };

I've also tried making localWorkSize 2 dimensional: {1, 512} but I get the same error, CL_INVALID_WORKGROUP_SIZE on this function call:

err = clEnqueueNDRangeKernel( openCLObjects.queue, openCLObjects.Normalize, 2, NULL,
                    globalWorkSize, localWorkSize, 0, NULL, NULL );

Any idea what could be going wrong?

Thanks.

1

There are 1 best solutions below

2
On BEST ANSWER

Device properties: (Generic upper limit for a device)

  • MAX_WORK_ITEM_SIZES: Maximum work items in a workgroup, in each dimension.
  • MAX_WORK_GROUP_SIZE: Maximum total work items in a workgroup (product of all dimensions sizes).

Kernel properties: (Specific limit for a device-kernel compiled)

  • CL_KERNEL_WORK_GROUP_SIZE: Maximum total workgroup items (product of all dimensions sizes)

The firs one is hardcoded for each device and is probably limited by how many items can be addressed in full SIMD mode.

The second limit is per kernel, and is what you should use instead. This one takes into account more things specific to your code. Like maximum private memory, etc...

Do you meet the second requirement as well?

BTW: You should always use in any case:

size_t globalWorkSize[2] = { 100, 2048 };
size_t localWorkSize[2] = { 1, 512 };