I am currently doing some benchmark tests using OpenCL on an AMD Radeon HD 7870.
The code that I have written in JOCL (the Java bindings for OpenCL) simply adds two 2D arrays (z= x + y) but it does so many times (z=x+y+y+y+y+y+y...).
The size of the two arrays I am adding is 500 by 501 and I am looping over the number of iterations I want to add them together on the GPU. So first I add them once, then ten times, then one thousand times, etc.
The maximum number of iterations that I loop to is 100,000,000. Below is what the log file looks like when I run my code (counter is the number of times my program executes in 5 seconds):
Number of Iterations: 1
Counter: 87
FLOPS Rate: 0.0043310947 GFLOPs/s
Number of Iterations: 10
Counter: 88
FLOPS Rate: 0.043691948 GFLOPs/s
Number of Iterations: 100
Counter: 84
FLOPS Rate: 0.41841218 GFLOPs/s
Number of Iterations: 1000
Counter: 71
FLOPS Rate: 3.5104263 GFLOPs/s
Number of Iterations: 10000
Counter: 8
FLOPS Rate: 3.8689642 GFLOPs/s
Number of Iterations: 100000
Counter: 62
FLOPS Rate: 309.70895 GFLOPs/s
Number of Iterations: 1000000
Counter: 17
FLOPS Rate: 832.0814 GFLOPs/s
Number of Iterations: 10000000
Counter: 2
FLOPS Rate: 974.4635 GFLOPs/s
Number of Iterations: 100000000
Counter: 1
FLOPS Rate: 893.7945 GFLOPs/s
Do these numbers make sense? I feel that 0.97 TeraFLOPS is quite high and that I must be calculating the number of FLOPs incorrectly.
Also, I believe that the number of FLOPs I am calculating should at one point level out with an increase in the number of iterations but that is not so evident here. It seems that if I continue to increase the number of iterations, the calculated FLOPS will also increase which also leads me to believe that I am doing something wrong.
Just for reference, I am calculating the FLOPS in the following way:
FLOPS = counter(500)(501)(iterations)/(time_elapsed)
Any help with this issue will be greatly appreciated.
Thank you
EDIT:
I have now done this same benchmark test looping over a range of iterations (the amount of times I add y to x) as well as array sizes. I have generated the following surface plot as can be seen at this GitHub repository
https://github.com/ke0m/Senior_Design/blob/master/JOCL/Graphing/GoodGPUPlot.PNG
I have asked the opinion of others on this plot and they mention to me that while the numbers I am calculating are feasible, they are artificially high. They say this is evident in the steep slope in the plot that does not really make any physical sense. One suggested idea as to why the slope is so steep is because the compiler converts the variable that controls the iterations (of type int) to a short and therefore forces this number to stay below 32000 (approximately). That means that I am doing less work on the GPU then I think I am and calculating a higher GFLOPS value.
Can anyone confirm this idea or offer any other ideas as to why the plot looks the way it does?
Thank you again
counter(500)(501)(iterations) - If this is calculated with integers, the result is likely to be too large for an integer register. If so convert to floating point before calculating.