I am trying to figure out the relationship between the clcreatecommandqueue
and clenqueuewritebuffer
API.
I am using a very simple example i.e. Vector addition and AMD CodeCL to gauge the performance.
Here are some numbers with different configurations for clcreatecommandqueue and clenqueuewritebuffer API:
Scenario Command-Queue Properties blocking_write time taken
1 in-order non-blocking 46.83ms
2 in-order blocking 4.711ms
3 out-of-order non-blocking 46.55ms
4 out-of-order blocking 4.358ms
The recorded time taken is for 4MB data transfer from cpu-to-gpu (I got similar results for a 40MB data transfer).
From my understanding, the least time taken should be for out-of-order and non-blocking and the most time taken should be for in-order and blocking write.
But the above data shows that whatever the type of command queue, the blocking write outperforms everything else.
Can anybody help me out in understanding these stats?