ONNX Runtime: io_binding.bind_input causing "no data transfer from DeviceType:1 to DeviceType:0"

350 Views Asked by At

I am using Nvidia Triton Inference Server and ONNX model for inference on a GPU instance. The Dockerfile, containing the environment, inference server and models contains following from/pip lines:

FROM --platform=linux/amd64 nvcr.io/nvidia/tritonserver:23.12-py3

RUN pip install torch transformers onnx onnxruntime-gpu onnxruntime

the model.py for the Triton Inference Server has been simplified to following:

import onnxruntime as ort
import torch
import numpy as np

session = ort.InferenceSession("path/to/onnx.model", providers=["CUDAExecutionProvider", "CPUExecutionProvider"])

...

io_binding = session.io_binding()
pt_script_embeddings = torch.rand(
    size=(100, 2010), dtype=torch.float32, device="cuda:0"
).contiguous()

io_binding.bind_input(
    name="np_script_embeddings",
    device_type="cuda",
    device_id=0,
    element_type=np.float32,
    shape=tuple(pt_script_embeddings.shape),
    buffer_ptr=pt_script_embeddings.data_ptr(),
)

logit_output_shape = (100, 2)
logit_output = torch.empty(logit_output_shape, dtype=torch.float32, device='cuda:0').contiguous()
io_binding.bind_output(
    name="np_logits",
    device_type="cuda",
    device_id=0,
    element_type=np.float32,
    shape=tuple(logit_output.shape),
    buffer_ptr=logit_output.data_ptr()
)

session.run_with_iobinding(io_binding)
outputs = logit_output.cpu().numpy()

Unfortunately, the error below is triggered at the line io_binding.bind_input causing me a lot of grief:

RuntimeError: Error when binding input: There's no data transfer registered for copying tensors from Device:[DeviceType:1 MemoryType:0 DeviceId:0] to Device:[DeviceType:0 MemoryType:0 DeviceId:0]

Note: articles reviewed before the SO submission:

2

There are 2 best solutions below

0
Dan M On BEST ANSWER

To resolve the issue I needed to carefully match versions ofcuda, pytorch and onnxruntime provided by the tritonserver docker image with the Python packages of torch and onnxruntime-gpu installed manually. Here is the process in details:

  • Understand what version of CUDA is currently supported by the onnxruntime-gpu by visiting onnx cuda execution provider. In my case it was cuda==12.2
  • Navigate to the Triton IS release notes and look for the Container Version with the matching cuda version from prior step. In my case it was tritonserver:23.10-py3
  • Navigate to the Triton IS version matrix to retrieve the version of PyTorch included with the Triton IS Docker Image. In my case torch 2.1

Base on the collected versions, update the environment. In my case it is the Docker image with following changes:

FROM --platform=linux/amd64 nvcr.io/nvidia/tritonserver:23.10-py3

RUN pip install transformers
RUN pip install torch==2.1

# https://onnxruntime.ai/docs/install/
# https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements
RUN pip install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/

NOTE: if your build environment has no access to the Azure repo: https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/ then retrieve and install the files manually from: https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/onnxruntime-cuda-12 (make sure to correct cuda-12 for your version)

0
simeonovich On

This is most likely just an installation issue.

The error you are getting

There's no data transfer registered for copying tensors from DeviceType:1 to DeviceType:0

means that ort is trying to copy data from the GPU (DeviceType:1, the torch tensor you explicitly initialized on cuda:0) to the CPU (DeviceType:0, the InferenceSession).

As 1) this transfer should be possible and 2) the InferenceSession should be on the GPU in the first place, what is probably happening is that your redundant installation of onnxruntime after onnxruntime-gpu in Dockerfile is messing up the dependencies. I.e. your code is using the onnxruntime installation which does not have GPU support.

Add print(session.get_providers()) to confirm that your session is defaulting to just CPUExecutionProvider, and try rebuilding the container without the unnecessary library installation.