Does NOT `tf.nn.ctc_beam_search_decoder()` support GPU in TensorFlow2?

511 Views Asked by At

Now, I try to use tf.nn.ctc_beam_search_decoder() on GPU.
But I have a problem that it does not use GPU.

I was able to check that other tensorflow functions(e.g. Reshape and SigmoidGrad etc.) run on GPU.
But some ones including ctc_beam_search_decoder() only run on CPU, and ctc_beam_search_decoder() is slow.

Then, I have two questions.
First, does not ctc_beam_search_decoder() support GPU in TensorFlow2 ?
Second, if it's supported, could you give me how to implement or the function (or method) ?

I show simple example below.

program code.

import tensorflow as tf
from tensorflow.python.client import device_lib

tf.debugging.set_log_device_placement(True)
print(device_lib.list_local_devices())

inputs = tf.convert_to_tensor([
    [0.1, 0.2, 0.3, 0.4, 0.5],
    [0.2, 0.0, 0.3, 0.1, 0.1],
    [0.2, 0.21, 0.3, 0.4, 0.1],
    [0.2, 0.0, 0.6, 0.1, 0.5],
    [0.2, 1.2, 0.3, 2.1, 0.1]])

inputs = tf.expand_dims(inputs, axis=1)
inputs_len = tf.convert_to_tensor([5])

decoded, _ = tf.nn.ctc_beam_search_decoder(inputs, inputs_len)

result(std output).

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 714951449022474384
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 11733532016050292601
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 394441871956590417
physical_device_desc: "device: XLA_GPU device"
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 11150726272
locality {
  bus_id: 1
  links {
  }
}
incarnation: 5917663253173554940
physical_device_desc: "device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7"
]
Executing op ExpandDims in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op CTCBeamSearchDecoder in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op StridedSlice in device /job:localhost/replica:0/task:0/device:GPU:0

Ignore the inputs and outputs data and focus on the device being used.
In this case, ExpandDims and StridedSlice were executed on GPU. But CTCBeamSearchDecoder was not executed on GPU.

1

There are 1 best solutions below

1
On BEST ANSWER

The beam search decoder is implemented in plain C++, so it runs on the CPU and not on the GPU (code see here [1], which is basically the same as in TF1).

Beam search is an iterative algorithm (goes from one time-step to the next), so I don't think running it on the GPU would give much of a performance improvement. The simplest way to improve runtime is to tune the beam width (the smaller the faster, the larger the more accurate).

[1] https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/util/ctc/ctc_beam_search.h#L159