we have recently set up rootless docker alongside our existing docker but ran into problems injecting host GPUs into the rootless containers. A workaround was presented in a Github issue (toggling no-cgroups to switch between rootful and rootless) with a mention of a better solution coming as a experimental feature in Docker 25, that feature being Nvidia OCI.

This is a mirror of Github Issue from the nvidia-container-toolkit page. The exact setup can be found there.

We have set up the yaml file and confirmed the CDI devices:

sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
...
INFO[0000] Found 5 CDI devices
nvidia.com/gpu=0
nvidia.com/gpu=1
nvidia.com/gpu=2
nvidia.com/gpu=4
nvidia.com/gpu=all

OCI injection works fine for the regular Docker (26.0) instance:

$ docker run --rm -ti --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=nvidia.com/gpu=all ubuntu nvidia-smi -L
GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-b6022b4d-71db-8f15-15de-26a719f6b3e1)
GPU 1: NVIDIA A100-SXM4-40GB (UUID: GPU-22420f7d-6edb-e44a-c322-4ce539cade19)
GPU 2: NVIDIA A100-SXM4-40GB (UUID: GPU-5e3444e2-8577-0e99-c6ee-72f6eb2bd28c)
GPU 3: NVIDIA A100-SXM4-40GB (UUID: GPU-dd1f811d-a280-7e2e-bf7e-b84f7a977cc1)

but produces the following errors for the rootless (26.0.0) version:

$ docker run --rm -ti --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=nvidia.com/gpu=all ubuntu nvidia-smi -L
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: could not apply required modification to OCI specification: error modifying OCI spec: failed to inject CDI devices: unresolvable CDI devices nvidia.com/gpu=all: unknown.

Note: That OCI support is still experimental in Docker 25 and requires export DOCKER_CLI_EXPERIMENTAL=enabled

Does anyone have experience with this usecase?

0

There are 0 best solutions below