sess.run() is too slow

3.3k Views Asked by At

The sess.run() function by the Tensorflow object detection module takes about 2.5 seconds to detect bounding bozes in a 600x600 image. How can I speed up this code?

def run(image, detection_graph):

with detection_graph.as_default():
    with tf.Session(graph=detection_graph) as sess:
        # Definite input and output Tensors for detection_graph
        image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
        # Each box represents a part of the image where a particular object was detected.
        detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
        # Each score represent how level of confidence for each of the objects.
        # Score is shown on the result image, together with the class label.
        detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
        detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
        num_detections = detection_graph.get_tensor_by_name('num_detections:0')

        # the array based representation of the image will be used later in order to prepare the
        # result image with boxes and labels on it.
        image_np = image
        # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
        image_np_expanded = np.expand_dims(image_np, axis=0)
        # Actual detection.
        print("2")
        start_time = datetime.datetime.now()
        (boxes, scores, classes, num) = sess.run(
          [detection_boxes, detection_scores, detection_classes, num_detections],
          feed_dict={image_tensor: image_np_expanded})
        end_time = datetime.datetime.now()
        diff = (end_time - start_time).total_seconds()*1000
        print (diff)
        print("3")

        return boxes[0], scores[0]
        #print scores
        #print classes
1

There are 1 best solutions below

0
On

Your sess.run execution time is normal for the first run, after that it will probably run 100 times faster (not kidding).

The key is re-using the session, in your example I'd add another image evaluation and measure that time and check if performance improves, like:

# all your prev code here
print (diff)
print("3")

image_np = image2 # get another image from somewhere
image_np_expanded = np.expand_dims(image_np, axis=0)
start_time = datetime.datetime.now()

(boxes, scores, classes, num) = sess.run(
          [detection_boxes, detection_scores, detection_classes, num_detections],
          feed_dict={image_tensor: image_np_expanded})
end_time = datetime.datetime.now()

diff = (end_time - start_time).total_seconds()*1000
print("Detection #2")
print(diff)

So you don't need GPU or smaller images (yet), just "warm-up" the session and use it for all the predictions.

I currently have a really modest setup in a test environment with the last version of Ubuntu running on VirtualBox, single core and no GPU (MobileNet2 + COCO dataset), the times I get are pretty decent once the session is "warm".

--- 3.7862255573272705 seconds ---
--- 0.21631121635437012 seconds ---
--- 0.1784508228302002 seconds ---

Note the first slow execution time, the last one was an image with size 1050*600