What are the main differences between TensorFlowLite, TendorFlow-TRT and TensorRT?

1.8k Views Asked by At

I am using the Coral devboard and the Nvidia Jetson TX2. And that is how I got to know about TensorFlow-Lite, TensorFlow-TRT and TensorRT. I have some questions about them:

  1. Between TensorFlow-TRT and TensorRT: When using a fully optimised/compatible graph with TensorRT, which one is faster and why?

  2. The pipeline to use TFlite in a Google Coral (When using TensorFlow 1.x...) is:

a. Use a model available in TensorFlow's zoo

b. Convert the model to frozen (.pb)

c. Use protobuff to serialize the graph

d. Convert to Tflite

e. Apply quantization (INT8)

f. Compile

what would be the pipeline when using TensorFlow-TRT and TensorRT? Is there somewhere where I can find a good documentation about it?

So far I think TensorRT is closer to TensorFlow Lite because:

  • TFlite: after compilation you end up with a .quant.edtpu.tflite file which can be used to make inference in the devboard

  • TensorRT: you will end up with a .plan file to make inference in the devboard.

Thank you for the answers, and if you can point me to documentation which compares them, that will be appreciated.

1

There are 1 best solutions below

0
On

TensorRT is a very fast CUDA runtime for GPU only. I am using an Nvidia Jetson Xavier NX with Tensorflow models converted to TensorRT, running on the Tensorflow-RT (TRT) runtime. The benefit of TRT runtime is any unsupported operations on TensorRT will fall back to using Tensorflow.

Have not tried Tensorflow-Lite, but I understand it as a reduced TF for inference-only on "small devices". It can support GPU but only limited operations and I think there are no python bindings (currently).