Google Colab: Nvidia-Smi and Libtorch not compatible anymore

2.1k Views Asked by At

The problem is new and has never happened before, so there might have been an update of the nvidia driver or libtorch. Problem: I am using Google Colab for additional GPU and want to install a programm, that needs libtorch. So, installing was working fine the last couple of weeks, however, starting from today, the program cannot be installed. I already tried to restart several times, reboot etc. and nothing seems to work. I also downloaded the new libtorch version for cuda 11.3 and updated cuda, so that the runtime runs on cuda 11.3. When I call

    !nvidia-smi

it gives out the information as usual. Nevertheless, after adding libtorch as environment variable as needed in order to use libtorch using

    os.environ['LIBTORCH'] = "/content/libtorch" 

and

    os.environ['LD_LIBRARY_PATH'] = "/content/libtorch/lib" 
    !nvidia-smi

suddenly displays "Failed to initialize NVML: Driver/library version mismatch". And since this is happening, I cannot install the program anymore.

So, I install rustc (since the program require rustup) and add it to the path with

    os.environ['PATH] += os.pathsep + "path/to/.cargo/bin" 

I add Libtorch as environment variable. I try to cargo-install the program. It usually worked fine, now it fails, throwing the error message:

error: linking with `cc` failed: exit status: 1
  
  = note: "cc" "-m64" "-Wl,--eh-frame-hdr" "-Wl,-znoexecstack" "-Wl,--as-needed" "-L" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib" 
.........................................
  = note: /usr/bin/ld: cannot find -ltorch_cuda
          /usr/bin/ld: cannot find -ltorch_cuda_cu
          /usr/bin/ld: cannot find -ltorch_cuda_cpp
          /usr/bin/ld: cannot find -ltorch_cpu
          /usr/bin/ld: cannot find -ltorch
          /usr/bin/ld: cannot find -lc10
          collect2: error: ld returned 1 exit status
0

There are 0 best solutions below