C++ AMP Performance Issue - Dramatic Inconsistent Performance without any data movement

215 Views Asked by At

I have an algorithm that relies upon approx. 15 different array_views, and 8 different for_each AMP kernel calls. The nature of the algorithm is that all of the data is copied to the GPU and then a set of kernel calls are executed over and over until some threshold is reached. Note: I have not yet implemented the threshold logic - which would require retrieving the data back to the CPU.

The complete set of kernel calls is called a cycle. In between each cycle, I do not reference any of the array_view data on the CPU....thus, I do not expect any data movement back to the CPU. However, the performance of the cycles is not regular. Most of the time the cycles take 10 or so milliseconds but then every 3rd or 4th cycle (or sometimes back to back), the cycle takes 400 or 500 of milliseconds to execute. The internal logic of the kernel calls is always basically the same.....so the increased execution time is not due to GPU logic.

What might be causing the dramatic increase in time for some cycles? See sample timings below.

Time to complete first cycle 503.278 (ms) This time includes overhead of AMP initialization, kernel compilation, and data movement to the GPU.

Time to complete 1 more cycles: 11.3105 (ms) Time to complete 2 more cycles: 10.7455 (ms) Time to complete 3 more cycles: 538.668 (ms) Time to complete 4 more cycles: 13.3055 (ms) Time to complete 5 more cycles: 14.4544 (ms) Time to complete 6 more cycles: 12.353 (ms) Time to complete 7 more cycles: 17.5154 (ms) Time to complete 8 more cycles: 755.255 (ms) Time to complete 9 more cycles: 11.7461 (ms) Time to complete 10 more cycles: 14.6612 (ms) Time to complete 11 more cycles: 417.788 (ms) Time to complete 12 more cycles: 399.167 (ms) Time to complete 13 more cycles: 12.2898 (ms) Time to complete 14 more cycles: 16.9694 (ms) Time to complete 15 more cycles: 151.228 (ms) Time to complete 16 more cycles: 404.659 (ms) Time to complete 17 more cycles: 10.4977 (ms) Time to complete 18 more cycles: 15.7178 (ms) Time to complete 19 more cycles: 207.768 (ms) Time to complete 20 more cycles: 511.538 (ms) Time to complete 21 more cycles: 14.4339 (ms) Time to complete 22 more cycles: 252.77 (ms) Time to complete 23 more cycles: 504.565 (ms) Time to complete 24 more cycles: 12.6931 (ms) Time to complete 25 more cycles: 15.5403 (ms) Time to complete 26 more cycles: 303.68 (ms) Time to complete 27 more cycles: 440.331 (ms) Time to complete 28 more cycles: 8.63698 (ms) Time to complete 29 more cycles: 13.9312 (ms) Time to complete 30 more cycles: 755.637 (ms) Exiting...... Press any key to continue . . .

0

There are 0 best solutions below