Inference speed problem even if using a high-end Hardware

26 Views Asked by Suraj Singh At 13 March 2024 at 07:06

We have trained ultralytics yolov8 model on 1024*1024, 3 channel images and converted to onnx and ran that onnx in visual studio 2022 c# .net v4.8 with onnxruntime-gpu v1.16.3 and it's taking around 90 ms on A5000 GPU. We also tried different onnxruntime sessions options like : Graph Optimization Level, inter_op_num_threads, intra_op_num_threads, Execution mode (ORT_PARALLEL and ORT_SEQUENTIAL), Optimization Options (enable_mem_pattern) to optimize the model inference capability and reduce the inference time.. But still there is no difference in the inference time. So can anyone suggest if we are missing something or how we can reduce the time further even a bit?

Also, we are using the same version of CUDA(11.2) for both the hardwares.

we had inference time of 35 ms with rtx-4090 and with rtx-A5000 we are getting 90 ms. We want our inference time to be 35-40 ms when we deploy on rtx-A5000.

Original Q&A

Inference speed problem even if using a high-end Hardware

There are 0 best solutions below

Related Questions in C#

Related Questions in COMPUTER-VISION

Related Questions in ONNX

Related Questions in YOLOV8

Related Questions in ONNXRUNTIME

Trending Questions

Popular # Hahtags

Popular Questions