friends! I have a question about processing with multiple gpu. I'm using 4 gpus and tried simple A^n + B^n example in 3 way like below.
Single GPU
with tf.device('/gpu:0'): ....tf.matpow codes...
Multiple GPU
with tf.device('/gpu:0'): ....tf.matpow codes... with tf.device('/gpu:1'): ....tf.matpow codes...
No specific gpu designated (I think maybe all of gpu used)
....just tf.matpow codes...
when tried this, the result was incomprehensible. the result was 1. single gpu : 6.x seconds 2. multiple gpu(2 gpus) : 2.x seconds 3. no specific gpu designated(maybe 4 gpus) : 4.x seconds
I cannot understand why #2 is faster than #3. Anyone can help me?
Thanks.
While the Tensorflow scheduler works well for single GPUs, it is not as good at optimizing the placement of computations on multiple GPUs yet. (Although it is being worked on presently.) Without further details, it's hard to know exactly what's going on. To get a get a better picture, you can log where the computations are actually being placed by the scheduler. You can do this by setting the
log_device_placement
flag on when creating thetf.Session
: