The problem is new and has never happened before, so there might have been an update of the nvidia driver or libtorch. Problem: I am using Google Colab for additional GPU and want to install a programm, that needs libtorch. So, installing was working fine the last couple of weeks, however, starting from today, the program cannot be installed. I already tried to restart several times, reboot etc. and nothing seems to work. I also downloaded the new libtorch version for cuda 11.3 and updated cuda, so that the runtime runs on cuda 11.3. When I call
!nvidia-smi
it gives out the information as usual. Nevertheless, after adding libtorch as environment variable as needed in order to use libtorch using
os.environ['LIBTORCH'] = "/content/libtorch"
and
os.environ['LD_LIBRARY_PATH'] = "/content/libtorch/lib"
!nvidia-smi
suddenly displays "Failed to initialize NVML: Driver/library version mismatch". And since this is happening, I cannot install the program anymore.
So, I install rustc (since the program require rustup) and add it to the path with
os.environ['PATH] += os.pathsep + "path/to/.cargo/bin"
I add Libtorch as environment variable. I try to cargo-install the program. It usually worked fine, now it fails, throwing the error message:
error: linking with `cc` failed: exit status: 1
= note: "cc" "-m64" "-Wl,--eh-frame-hdr" "-Wl,-znoexecstack" "-Wl,--as-needed" "-L" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib"
.........................................
= note: /usr/bin/ld: cannot find -ltorch_cuda
/usr/bin/ld: cannot find -ltorch_cuda_cu
/usr/bin/ld: cannot find -ltorch_cuda_cpp
/usr/bin/ld: cannot find -ltorch_cpu
/usr/bin/ld: cannot find -ltorch
/usr/bin/ld: cannot find -lc10
collect2: error: ld returned 1 exit status