For example, the RNN is a dynamic 3-layer bidirectional LSTM with the hidden vector size of 200 (tf.nn.bidirectional_dynamic_rnn) and I have 4 GPUs to train the model. I saw a post using data parallelism on subsets of samples in a batch but that didn't speed up the training process.
How to speed up the training of an RNN model with multiple GPUs in TensorFlow?
1.3k Views Asked by Maosi Chen At
1
There are 1 best solutions below
Related Questions in TENSORFLOW
- (Tensorflow)Does the op assign change the gradient computation?
- Tensorflow Windows Accessing Folders Denied:"NewRandomAccessFile failed to Create/Open: Access is denied. ; Input/output error"
- Android App TensorFlow Google Cloud ML
- Convert Tensorflow model to Caffe model
- Google Tensorflow LSTMCell Variables Mapping to Hochreiter97_lstm.pdf paper
- additive Gaussian noise in Tensorflow
- TFlearn evaluate method results meaning
- Regularization losses Tensorflow - TRAINABLE_VARIABLES to Tensor Array
- feed picture to model tensorflow for training
- Fail to read the new format of tensorflow checkpoint?
- I got a error when running a github project in tensorflow
- Tensorflow R0.12 softmax_cross_entropy_with_logits ASSERT Error
- RuntimeError in run_one_batch of TensorFlowDataFrame in tensorflow
- Same output in neural network for each input after training
- ConvNet : Validation Loss not strongly decreasing but accuracy is improving
Related Questions in DISTRIBUTED-COMPUTING
- Is curator's persistent ephemeral nodes just regular ephemeral with retries?
- IPython MPI with a Machinefile
- Prevent RabbitMQ erl_crash.dump files?
- Hazelcast 3.3 - EntryProcessor is accessing "non-local" keys
- Java RMI Compute Engine
- Data division on Addition of node to distributed System
- Shuffled vs non-shuffled coalesce in Apache Spark
- Accessing data on distributed database on OrientDB
- Leverage Round Robin DNS for image transfer
- MPI Allreduce error on MPICH 3.1.5 on ARMv7
- Why can't CP systems also be CAP?
- In a distributed Java web application, how to share a value between all servlets on all machines?
- How is service discovery not a subset of centralized configuration?
- Warning that "unknown addresses are found in partition table"
- How to compute the average(or sum) of node values in a network?
Related Questions in LSTM
- Conclusion from PCA of dataset
- Google Tensorflow LSTMCell Variables Mapping to Hochreiter97_lstm.pdf paper
- Predicting the Sinus Functions with RNNs
- CNTK Complaining about Dynamic Axis in LSTM
- How to Implement "Multidirectional" LSTMs?
- Many-to-one setting in LSTM using CNTK
- Error in Dimension for LSTM in tflearn
- LSTM model approach for time series (future prediction)
- How to improve the word rnn accuracy in tensorflow?
- How to choose layers in RNN (recurrent neural networks)?
- How to insert a value at given index or indices ( mutiple index ) into a Tensor?
- Retrieving last value of LSTM sequence in Tensorflow
- LSTM Networks for Sentiment Analysis - How to extend this model to 3 classes and classify new examples?
- Choosing the Length of Time Steps in Recurrent Neural Network
- The meaning of batch_size in ptb_word_lm (LSTM model of tensorflow)
Related Questions in RECURRENT-NEURAL-NETWORK
- RNN Cell not present in tf.get_collection
- How to choose layers in RNN (recurrent neural networks)?
- Get the last output of a dynamic_rnn in TensorFlow
- Dynamic tensor shape for tensorflow RNN
- Choosing the Length of Time Steps in Recurrent Neural Network
- How to use keras RNN for text classification in a dataset?
- How to Initialize LSTMCell with tuple
- What is the most efficient way to implement multi-layer RNNs in TensorFlow?
- Confused about weight and bias dependencies affecting learning
- Mxnet RNN Time Series Prediction
- Recurrent Neural Networks for Panel Data
- Multi-dimension dynamic rnn with tensorflow
- Tensorflow Network Save and Restore
- Wrong Number of LSTM Dimensions in Keras
- Tensorflow seq2seq `feed_previous' argument`
Related Questions in MULTIPLE-GPU
- Initialize struct on different GPUs
- OpenCl wrong values when reading from multiple GPU
- How can I use local llm model with langchain VLLM?
- Concurrency in CUDA multi-GPU executions
- Model get stuck by using MirroredStrategy()
- How to speed up the training of an RNN model with multiple GPUs in TensorFlow?
- Trying to create optimizer slot variable under the scope for tf.distribute.Strategy, which is different from the scope used for the original variable
- Tensorflow not allocate tensor/op to all available GPUs
- Possible to use tf.distribute.Strategy.mirroredstrategy on parts of the graph rather than entire train_step for GAN custom training script?
- TensorFlow on multiple GPU
- Running OpenCL kernel on multiple GPUs?
- Tensorflow processing performance with multiple gpu
- tensorflow does not recognise 2nd GPU (/gpu:1)
- How does the Windows 10 render windows under multi-display, multi-GPU environment?
- How to dedicate DirectX to a GPU and dedicate CUDA to another GPU?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
You can also try model parallelism. One way to do this is to make a cell wrapper like this, which will create cells on a specific device:
Then place each individual layer onto dedicated GPU: