We have trained ultralytics yolov8 model on 1024*1024, 3 channel images and converted to onnx and ran that onnx in visual studio 2022 c# .net v4.8 with onnxruntime-gpu v1.16.3 and it's taking around 90 ms on A5000 GPU. We also tried different onnxruntime sessions options like : Graph Optimization Level, inter_op_num_threads, intra_op_num_threads, Execution mode (ORT_PARALLEL and ORT_SEQUENTIAL), Optimization Options (enable_mem_pattern) to optimize the model inference capability and reduce the inference time.. But still there is no difference in the inference time. So can anyone suggest if we are missing something or how we can reduce the time further even a bit?
Also, we are using the same version of CUDA(11.2) for both the hardwares.
we had inference time of 35 ms with rtx-4090 and with rtx-A5000 we are getting 90 ms. We want our inference time to be 35-40 ms when we deploy on rtx-A5000.