I am training a "faster_rcnn_inception_resnet_v2_atrous_coco" for custom object detection using tensorflow's api.
I set up a machine at azure with following configuration:
Intel cxeon CPU E5-2690 v3 @ 2.60GHz RAM 56GB windows10 64bit GPU tesla k80 total memory 11.18GB
When i run train.py I get the following speed per second:
INFO:tensorflow:global step 458: loss = 0.5601 (3.000 sec/step) I1009 19:30:13.254615 5916 tf_logging.py:115] global step 458: loss = 0.5601 (3.000 sec/step) INFO:tensorflow:global step 459: loss = 0.5724 (3.077 sec/step) I1009 19:30:16.331734 5916 tf_logging.py:115] global step 459: loss = 0.5724 (3.077 sec/step) INFO:tensorflow:global step 460: loss = 0.8615 (3.018 sec/step) I1009 19:30:19.350132 5916 tf_logging.py:115] global step 460: loss = 0.8615 (3.018 sec/step) INFO:tensorflow:global step 461: loss = 0.6021 (3.062 sec/step) I1009 19:30:22.428256 5916 tf_logging.py:115] global step 461: loss = 0.6021 (3.062 sec/step)
Is it fast enough or it should be faster as it is using a GPU? batchsize of the config file is 1. When I change it to 2 or higher it runs out of memory.
it takes 3 seconds per step in a dataset of 93 images. Ok... but after training, when I load the frozen graph and try to predict it over all images, it takes 1 second per image.. with the GPU... seems too slow.. what I am doing wrong?