CPU-GPU FLOP Rate

1k Views Asked by At

I need to calculate how many flops per transferred value a code should provide so that running the code on GPU will be worth enough to increase the performance.

Here are the flop rates and assumptions:

1. PCIe 16x v3.0 bus is able to transfer data from CPU to GPU at a rate of 15.75 GB/s.

2. GPU is able to perform 8 single precision TFLOPs/second.

3. CPU is able to perform 400 single precision GFLOPs/second.

4. Single precision floating point number is 4 bytes.

5. Calculation can overlap with data transfers.

6. Data is originally placed in the CPU.

How would a problem like this be solved step by step?

1

There are 1 best solutions below

3
On

Interpreting assumption 5 to mean the CPU isn't deranged in any way be transferring data to the GPU. There is obviously no reason not to use the GPU, you can only gain.

By not taking assumption 5 into account the question gets more interesting. Assuming while transferring data from CPU to GPU, the CPU can't calculate, we arrive at this: I think you are looking for the computaional intensity (=:ci) FLOP/byte at which it is beneficial to let the CPU halt its calculation to transfer data so the GPU can participate. Let's say you haved bytes of data to process with an algorithm of computational intensity ci. You split the data up into d_cpu and d_gpu with d_cpu+d_gpu=d. It takes t_1 = d_gpu / (15.75 GB/s) to transfer the data. Then you let both compute for t_2. Meaning t_2 = ci * d_gpu / (8 TFLOP/s) = ci * d_cpu / (400 GFLOP/s). The total time beeing t_3 = t_1 + t_2.

If the CPU does it all alone it needs t_4 = ci * d / (400 GFLOP/s).

So the point where both options take the same time is at

t_3 = t_4
t_1 + t_2 = t_4
d_gpu / (15.75 GB/s) + ci * d_gpu / (8 TFLOP/s) = ci * (d_cpu + d_gpu) / (400 GFLOP/s)

with

d_gpu / (8 TFLOP/s) = d_cpu / (400 GFLOP/s)

resulting in

ci ~= 1.2 FLOP/byte