TF object detection: return subset of inference payload

Question

TF object detection: return subset of inference payload

527 Views Asked by wronk At 17 August 2025 at 10:45

Problem

I'm working on training and deploying an instance segmentation model using TF's object detection API. I'm able to successfully train the model, package it into a TF Serving Docker image (latest tag as of Oct 2020), and process inference requests via the REST interface. However, the amount of data returned from an inference request is very large (hundreds of Mb). This is a big problem when the inference request and processing don't happen on the same machine because all that returned data has to go over the network.

Is there a way to trim down the number of outputs (either during model export or within the TF Serving image) so allow faster round trip times during inference?

Details

I'm using TF OD API (with TF2) to train a Mask RCNN model, which is a modified version of this config. I believe the full list of outputs is described in code here. The list of items I get during inference is also pasted below. For a model with 100 object proposals, that information is ~270 Mb if I just write the returned inference as json to disk.

inference_payload['outputs'].keys()

dict_keys(['detection_masks', 'rpn_features_to_crop', 'detection_anchor_indices', 'refined_box_encodings', 'final_anchors', 'mask_predictions', 'detection_classes', 'num_detections', 'rpn_box_predictor_features', 'class_predictions_with_background', 'proposal_boxes', 'raw_detection_boxes', 'rpn_box_encodings', 'box_classifier_features', 'raw_detection_scores', 'proposal_boxes_normalized', 'detection_multiclass_scores', 'anchors', 'num_proposals', 'detection_boxes', 'image_shape', 'rpn_objectness_predictions_with_background', 'detection_scores'])

I already encode the images within my inference requests as base64, so the request payload is not too large when going over the network. It's just that the inference response is gigantic in comparison. I only need 4 or 5 of the items out of this response, so it'd be great to exclude the rest and avoid passing such a large package of bits over the network.

Things I've tried

I've tried setting the score_threshold to a higher value during the export (code example here) to reduce the number of outputs. However, this seems to just threshold the detection_scores. All the extraneous inference information is still returned.
I also tried just manually excluding some of these inference outputs by adding the names of keys to remove here. That also didn't seem to have any effect, and I'm worried this is a bad idea because some of those keys might be needed during scoring/evaluation.
I also searched here and on tensorflow/models repo, but I wasn't able to find anything.

Original Q&A

There are 4 best solutions below

Dieter Maes On 12 October 2020 at 10:48

I've ran into the same problem. In the exporter_main_v2 code there is stated that the outputs should be:

and the following output nodes returned by the model.postprocess(..):
  * `num_detections`: Outputs float32 tensors of the form [batch]
      that specifies the number of valid boxes per image in the batch.
  * `detection_boxes`: Outputs float32 tensors of the form
      [batch, num_boxes, 4] containing detected boxes.
  * `detection_scores`: Outputs float32 tensors of the form
      [batch, num_boxes] containing class scores for the detections.
  * `detection_classes`: Outputs float32 tensors of the form
      [batch, num_boxes] containing classes for the detections.

I've submitted an issue on the tensorflow object detection github repo, I hope we will get feedback from the tensorflow dev team.

The github issue can be found here

Yishi Guo On 16 April 2021 at 04:07

If you are using exporter_main_v2.py file to export your model, you can try this hack way to solve this problem.

Just add following codes in the function _run_inference_on_images of exporter_lib_v2.py file:

    detections[classes_field] = (
        tf.cast(detections[classes_field], tf.float32) + label_id_offset)

############# START ##########
    ignored_model_output_names = ["raw_detection_boxes", "raw_detection_scores"]
    for key in ignored_model_output_names:
        if key in detections.keys(): del detections[key]
############# END ##########

    for key, val in detections.items():
      detections[key] = tf.cast(val, tf.float32)

Therefore, the generated model will not output the values of ignored_model_output_names.

Please let me know if this can solve your problem.

Maxi On 11 January 2022 at 21:06

Another approach would be to alter the signatures of the saved model:

model = tf.saved_model.load(path.join("models", "efficientdet_d7_coco17_tpu-32", "saved_model"))

infer = model.signatures["serving_default"]
outputs = infer.structured_outputs
for o in ["raw_detection_boxes", "raw_detection_scores"]:
    outputs.pop(o)

tf.saved_model.save(
    model,
    export_dir="export",
    signatures={"serving_default" : infer},
    options=None
)

**wronk** · Accepted Answer

I was able to find a hacky workaround. In the export process (here), some of the components of the prediction dict are deleted. I added additional items to the non_tensor_predictions list, which contains all keys that will get removed during the postprocess step. Augmenting this list cut down my inference outputs from ~200MB to ~12MB.

Full code for the if self._number_of_stages == 3 block:

    if self._number_of_stages == 3:

      non_tensor_predictions = [
          k for k, v in prediction_dict.items() if not isinstance(v, tf.Tensor)]

      # Add additional keys to delete during postprocessing
      non_tensor_predictions = non_tensor_predictions + ['raw_detection_scores', 'detection_multiclass_scores', 'anchors', 'rpn_objectness_predictions_with_background', 'detection_anchor_indices', 'refined_box_encodings', 'class_predictions_with_background', 'raw_detection_boxes', 'final_anchors', 'rpn_box_encodings', 'box_classifier_features']
      
      for k in non_tensor_predictions:
        tf.logging.info('Removing {0} from prediction_dict'.format(k))
        prediction_dict.pop(k)

      return prediction_dict

I think there's a more "proper" way to deal with this using signature definitions during the creation of the TF Serving image, but this worked for a quick and dirty fix.

TF object detection: return subset of inference payload

Problem

Details

Things I've tried

There are 4 best solutions below

Related Questions in TENSORFLOW

Related Questions in TENSORFLOW-SERVING

Related Questions in OBJECT-DETECTION-API

Trending Questions

Popular # Hahtags

Popular Questions