How to speed up the training of an RNN model with multiple GPUs in TensorFlow?

1.3k Views Asked by Maosi Chen At 11 December 2017 at 23:09

For example, the RNN is a dynamic 3-layer bidirectional LSTM with the hidden vector size of 200 (tf.nn.bidirectional_dynamic_rnn) and I have 4 GPUs to train the model. I saw a post using data parallelism on subsets of samples in a batch but that didn't speed up the training process.

Original Q&A

There are 1 best solutions below

Maxim On 12 December 2017 at 12:02 BEST ANSWER

You can also try model parallelism. One way to do this is to make a cell wrapper like this, which will create cells on a specific device:

class DeviceCellWrapper(tf.nn.rnn_cell.RNNCell):
  def __init__(self, cell, device):
    self._cell = cell
    self._device = device

  @property
  def state_size(self):
    return self._cell.state_size

  @property
  def output_size(self):
    return self._cell.output_size

  def __call__(self, inputs, state, scope=None):
    with tf.device(self._device):
      return self._cell(inputs, state, scope)

Then place each individual layer onto dedicated GPU:

cell_fw = DeviceCellWrapper(cell=tf.nn.rnn_cell.LSTMCell(num_units=n_neurons, state_is_tuple=False), device='/gpu:0')
cell_bw = DeviceCellWrapper(cell=tf.nn.rnn_cell.LSTMCell(num_units=n_neurons, state_is_tuple=False), device='/gpu:0')
outputs, states = tf.nn.bidirectional_dynamic_rnn(cell_fw, cell_bw, X, dtype=tf.float32)

How to speed up the training of an RNN model with multiple GPUs in TensorFlow?

There are 1 best solutions below

Related Questions in TENSORFLOW

Related Questions in DISTRIBUTED-COMPUTING

Related Questions in LSTM

Related Questions in RECURRENT-NEURAL-NETWORK

Related Questions in MULTIPLE-GPU

Trending Questions

Popular # Hahtags

Popular Questions