"Unknown runtime specified nvidia." Previously worked now can't get docker-compose up to work. (Ubuntu/Docker)

491 Views Asked by At

Some background context: I am trying to setup hardware acceleration on a jellyfin container I have. This worked initially, but has seemed to suddenly stopped working after trying to restart the container.

I previously had a container setup using the nvidia runtime. I have the nvidia-docker2 package installed. Followed all the steps in the nvidia installation guide. I've double checked my drivers are up to date. From everything I can tell this should be working.

Here's a snapshot of the packages I can see installed

ii  nvidia-container-runtime              3.13.0-1                                all          NVIDIA container runtime
ii  nvidia-container-toolkit              1.13.5-1                                amd64        NVIDIA Container toolkit
ii  nvidia-container-toolkit-base         1.13.5-1                                amd64        NVIDIA Container Toolkit Base
ii  nvidia-dkms-535                       535.98-0ubuntu0~gpu22.04.1              amd64        NVIDIA DKMS package
ii  nvidia-docker2                        2.13.0-1                                all          nvidia-docker CLI wrapper

This also worked previously until I had to restart the container for an unrelated reason. Now I can't run docker compose with the nvidia runtime included (leaving it out, everything works fine).

Here's what my docker compose file looks like:

version: '3'
services:
  jellyfin:
    image: jellyfin/jellyfin
    user: 1000:1000
    network_mode: 'host'
    volumes:
      - /path/to/config:/config
      - /path/to/cache:/cache
      - /path/to/media:/media
    runtime: nvidia
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]

I would appreciate any ideas/tips on how to debug further. I am a bit at a loss since everything should be in the right place given all the documentation/posts I can find and previous experience...

Edit: including my daemon.json file for docker too:

{
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }
}
1

There are 1 best solutions below

0
On BEST ANSWER

Ok, so after some digging/roundabout successes, it seems like this is an issue specifically with the snap version of docker and how it interacts with the nvidia container runtime tools.

My first clue was realizing that the docker daemon.json file that the container tools were modifying was not the one being used by the snap version of docker (that one lives under /var/snap/docker/current/config for my particular ubuntu install). Manually adding the runtime definitions there got me a step further (docker now knew about the nvidia runtime), but it was still unable to connect to some necessary libraries to run. One clue I found with that was that the snap version of docker only can access files in $HOME.

I'm sure there's a way yet to link everything up given that knowledge, but I decided to try uninstalling the snap version of docker and reinstalling through apt. That worked like a charm!