I am using Nvidia Triton Inference Server and ONNX model for inference on a GPU instance.
The Dockerfile, containing the environment, inference server and models contains following from/pip lines:
FROM --platform=linux/amd64 nvcr.io/nvidia/tritonserver:23.12-py3
RUN pip install torch transformers onnx onnxruntime-gpu onnxruntime
the model.py for the Triton Inference Server has been simplified to following:
import onnxruntime as ort
import torch
import numpy as np
session = ort.InferenceSession("path/to/onnx.model", providers=["CUDAExecutionProvider", "CPUExecutionProvider"])
...
io_binding = session.io_binding()
pt_script_embeddings = torch.rand(
size=(100, 2010), dtype=torch.float32, device="cuda:0"
).contiguous()
io_binding.bind_input(
name="np_script_embeddings",
device_type="cuda",
device_id=0,
element_type=np.float32,
shape=tuple(pt_script_embeddings.shape),
buffer_ptr=pt_script_embeddings.data_ptr(),
)
logit_output_shape = (100, 2)
logit_output = torch.empty(logit_output_shape, dtype=torch.float32, device='cuda:0').contiguous()
io_binding.bind_output(
name="np_logits",
device_type="cuda",
device_id=0,
element_type=np.float32,
shape=tuple(logit_output.shape),
buffer_ptr=logit_output.data_ptr()
)
session.run_with_iobinding(io_binding)
outputs = logit_output.cpu().numpy()
Unfortunately, the error below is triggered at the line io_binding.bind_input causing me a lot of grief:
RuntimeError: Error when binding input: There's no data transfer registered for copying tensors from Device:[DeviceType:1 MemoryType:0 DeviceId:0] to Device:[DeviceType:0 MemoryType:0 DeviceId:0]
Note: articles reviewed before the SO submission:
To resolve the issue I needed to carefully match versions of
cuda,pytorchandonnxruntimeprovided by thetritonserverdocker image with the Python packages oftorchandonnxruntime-gpuinstalled manually. Here is the process in details:onnxruntime-gpuby visiting onnx cuda execution provider. In my case it wascuda==12.2Container Versionwith the matching cuda version from prior step. In my case it wastritonserver:23.10-py3torch 2.1Base on the collected versions, update the environment. In my case it is the Docker image with following changes:
NOTE: if your build environment has no access to the Azure repo: https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/ then retrieve and install the files manually from: https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/onnxruntime-cuda-12 (make sure to correct
cuda-12for your version)