My goal is to complete FFTs of 2 - 4K Data points together. Hence, I made 2 kernel objects from the same kernel and Enqueued the tasks at once, i.e. without any Buffer Read-Write or any callbacks in between. I find out that it doesn't happen that way. In addition to that, there is also some idle time between the executions. Can someone please explain? AOCL Report of the Program

I was expecting both of them to run in parallel because my FPGA seems to have more area. About 38 percent of it is used.

2

There are 2 best solutions below

1
On BEST ANSWER

I found this question that kind off answers my doubts. It can be foundhere

1
On

The OpenCL queue works sequentially, so one kernel is executed after the other. This makes sure that - if kernel 2 reads memory that kernel 1 has updated, there is no race condition like if they would run concurrently. There may also be some latency to start execution of a kernel.

To run multiple kernels in parallel, you can try multiple queues.