when I try nvidia-smi I am getting this error:
Failed to initialize NVML: DRiver/library version mismatch
But when I try nvcc --version, getting this output:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Mar_21_19:15:46_PDT_2021
Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0
I test this lsmod | grep nvidia and output is this:
nvidia_uvm 1200128 0
nvidia_drm 65536 14
nvidia_modeset 1200128 11 nvidia_drm
nvidia 35483648 1148 nvidia_uvm,nvidia_modeset
drm_kms_helper 307200 1 nvidia_drm
drm 618496 18 drm_kms_helper,nvidia,nvidia_drm
which nvidia-smi output: /usr/bin/nvidia-smi
ps aux | grep nvidia-persistenced
output:
jasur 814285 0.0 0.0 18980 2772 pts/3 S+ 13:04 0:00 grep --color=auto nvidia-persistenced
I am using ubuntu 2004
I test this lsmod | grep nvidia and output is this:
nvidia_uvm 1200128 0
nvidia_drm 65536 14
nvidia_modeset 1200128 11 nvidia_drm
nvidia 35483648 1148 nvidia_uvm,nvidia_modeset
drm_kms_helper 307200 1 nvidia_drm
drm 618496 18 drm_kms_helper,nvidia,nvidia_drm
which nvidia-smi output: /usr/bin/nvidia-smi
ps aux | grep nvidia-persistenced output:
jasur 814285 0.0 0.0 18980 2772 pts/3 S+ 13:04 0:00 grep --color=auto nvidia-persistenced
I also encountered this issue, but I have resolved it.
Firstly, many people can resolve this problem by restarting the system, so you can try that. If that doesn't work, you may need to reinstall the NVIDIA driver. I am using an LXC container, and due to the container sharing the kernel of the NVIDIA driver, if the container is inadvertently upgraded, a mismatch between the client and kernel versions of the NVIDIA driver will occur. Using nvidia-smi will result in:
Failed to initialize NVML: Driver/library version mismatch We can get detailed information about this error by running:
get:
Therefore, the simplest solution is to install the same driver version as the host machine.
We should clean up our original driver:
Uninstall the old version of the driver:
Run the repair command:
Here, it may prompt us to clean up libraries with no dependencies. You can clear them by :
I recommend manually searching for the installation package of the specified version, as using the following command may not specify a smaller version number and may differ, leading to the same error:
Therefore, we should manually find the driver version 470.182.03 to replace 470.223.02. I downloaded NVIDIA-Linux-x86_64-470.182.03.run.
Now we need to install our driver. Before installing the driver, make sure that no processes are running on our GPU. We can manually kill these processes or restart the host machine. Then, execute our installation:
Because the GPU driver in the container does not need to install kernel files, we add --no-kernel-module at the end.
After installing the GPU driver in the container, restart and enter nvidia-smi to check if the driver is installed successfully.