I successfully retrained a Retinanet network (https://github.com/fizyr/keras-retinanet) against my own dataset. Everything is working as expected when I run it against a video stream. Object are detected but performances are poor on a Jetson Xavier so I'm trying to move it under TensorRT and it's where I started to be confused :-) I converted the .h5 to .pb and used the TF/NVidia tooling to optimize the model. With Keras, getting the bounding box and displaying them is straightforward:
boxes, scores, labels = model.predict_on_batch(np.expand_dims(image, axis=0))
With the Tensorflow model I use a construct like below:
input_names = ['input_1']
output_names = ['classification/concat','regression/concat']
tf_input = tf_sess.graph.get_tensor_by_name(input_names[0] + ':0')
tf_input.shape.as_list()
tf_classification = tf_sess.graph.get_tensor_by_name(output_names[1] + ':0')
tf_regression = tf_sess.graph.get_tensor_by_name(output_names[0] + ':0')
regression, classification = tf_sess.run([tf_classification, tf_regression], feed_dict={
tf_input: image[None, ...]})
'regression' and 'classification' come back populated with the right shape. The question now is: HOW DO I MAP 'regression' and 'classification' TO BOUNDING BOXES, SCORES, LABELS ?