OpenACC nvlink undefined reference to class

572 Views Asked by At

I am new to OpenACC and I am writing a new program from scratch (I have a fairly good idea what loops will be computationally costly from working in a similar problem before). I am getting an "Undefined reference" from nvlink. From my research, I found this is because no device code is being generated for the class I created. However, I don't understand why this is happening and how to fix it.

Below I send a MWE from my code.

include/vec1.h

#ifndef VEC1_H
#define VEC1_H

class Vec1{
    
public:
    double data[1];

    #pragma acc routine seq
    Vec1();
    #pragma acc routine seq
    Vec1(double x);
    
    #pragma acc routine seq
    Vec1 operator* (double x);

};

#endif

src/vec1.cpp

#include "vec1.h"

Vec1::Vec1(){
    data[0] = .0;
}

Vec1::Vec1(double x){
    data[0] = x;
}


Vec1 Vec1::operator*(double c){
    Vec1 r = Vec1(0.);
    r.data[0] = c*data[0];
    return r;
}

vec1_test_gpu.cpp

#include "vec1.h"

#define NUM_VECTORS 1000000

int main(){
    
    Vec1 vec1_array[NUM_VECTORS];
    for(int iv=0; iv<NUM_VECTORS; ++iv){
        vec1_array[iv] = Vec1(iv);
    }
    #pragma acc data copyin(vec1_array)
    
    #pragma acc parallel loop
    for(int iv=0; iv<NUM_VECTORS; ++iv){
        vec1_array[iv] = vec1_array[iv]*2;
    }
    return 0;
}

I compile them in the following way

$ nvc++ src/vec1.cpp -c -I./include -O3 -march=native -ta=nvidia:cuda11.2 -fPIC
$ nvc++ -shared -o libvec1.so vec1.o
$ nvc++ vec1_test_gpu.cpp -I./include -O3 -march=native -ta=nvidia:cuda11.2 -L./ -lvec1

The error message appears just after the last command and reads nvlink error : Undefined reference to '_ZN4Vec1mlEd' in '/tmp/nvc++jOtCBiT_m38d.o'

1

There are 1 best solutions below

1
On BEST ANSWER

The problem here is that you're trying to call a device routine, "Vec1::operator*", that's contained in a shared object from a kernel in the main program. nvc++'s OpenACC implementation uses CUDA to target NVIDIA devices. Since CUDA doesn't have a dynamic linker for device code, at least not yet, this isn't supported.

You'll need to either link this statically, or move the "parallel loop" into the shared object.

Note that the "-ta" flag has been deprecated. Please consider using "-acc -gpu=cuda11.2" instead.