TRT versus TF-TRT

887 Views Asked by At

I need to convert some models to be able to deploy them on jetson devices. I have tried the TensorRT for Yolov3 trained on coco 80, but I wasn't successful to inference it so I decided to do the TF-TRT. It worked on my laptop, the FPS is increased but the size and the GPU memory usage didn't changed. Size of model was 300MB, it gets abit bigger. Before and after TF-TRT model still using 16 GB GPU memory.

  1. Is it sth usual? I mean is it ok or there is sth wrong? I expected to achieve lower size, lesser GPU memory usage and higher FPS (BTW nodes are reduced).

  2. The important thing is that the FPS jumps hardly after TF-TRT. I got around 3FPS before TF-TRT but after that I am getting 4,6,7,8,9 FPSs, but the FPS is not changing smoothly, for example for the first frame I get 4, and for the second frame I get 9 FPS, I can see these jumps in the visualization over the video as well. why this happened? How can I fix it?

  3. I have read that TRT has better performance than TF-TRT. Is it True? What is the exact difference between them? I am confused

  4. I have another model that I need to convert it to TRT but it is a pytorch model (HourGlass CNN). Do you know how I can do it? Is there any valid/working repo on github or tutorials on YouTube which you can share?

  5. Tensorflow to TRT is easier or Pytorch to TRT?

Thank you very much

1

There are 1 best solutions below

1
On

Hope my experience match your needs

1 - Yes it is usual with models that are not prepared to be optimized a lot. Yolo is a very huge model, no matters if you translate to TRT. TRT make it works and better than TF-TRT, because with TRT the model is optimized 100% or it fail. With TF-TRT the optimization ocurrs only on the layers that could be optimized and other are leave as it is.

2 - Yes you could fix it! For Jetson Nano you have deepstream, a optimized framwork to run all inference over GPU wthout using CPU to move memory (using TRT inside). For deepstream you have a YOlo demo optimized, in Jetson nano I have achive 12 FPS for YOlov3, and you have the option of tinyYolo for better performance. https://www.reddit.com/r/learnmachinelearning/comments/hy50dl/a_tutorial_on_implementing_yolo_v3_with/

3 - As I mention before. IF you translate your model to TRT from ONNX or etlt using TRTexec or deepstream, the system will optimize 100% of the layers or it will fail in the process. With TF-TRT the system "do it best" but not guarantee that all layers are optmized to the specific hardware. TF-TRT is a better solution for custom/rare models or if you need to make quick test.

4/5 - In the past, if you have a Pytorch model you need first to convert it to ONNX and then to TRT with trtExec. In the last month, with TRT 8.0 you have the posibility yo use pytoch-TRT, like tensorflow-trt. So today is the same. but if performance FPS is your concern I recommend you to go from tensorflow/pytorch to ONNX and then to TRT with trtexec or deepstream.