Cilk and CUDA combination and compilation

150 Views Asked by At

I have a program that consists of 3 files, one .c file and two .cu files, nn.cu and parallel.cu. The main function is located in the one .cu file, the nn.cu and the .c file (utils.c) I have it as extern "C" in the parallel.cu . I want to further parallelize the program (which runs perfectly without cilk), so I considered cilk, with _Cilk_spawn and _Cilk_sync:

int main(int argc, char* argv[] ) {

    clock_t begin = clock();

    srand((unsigned)time(NULL));

    int n_inputs = atoi(argv[2]);
    int n_hidden = atoi(argv[3]);
    int n_outputs = atoi(argv[4]);

    // Build output layer
    NeuralNet nn = buildNeuralNet(n_inputs, n_outputs, n_hidden);

    // Build training samples
    int _p1[] = {0,0};
    Pattern p1 = makePatternSingleOutput(_p1, 0);
    int _p2[] = {0,1};
    Pattern p2 = makePatternSingleOutput(_p2, 1);
    int _p3[] = {1,1};
    Pattern p3 = makePatternSingleOutput(_p3, 1);
    int _p4[] = {1,0};
    Pattern p4 = makePatternSingleOutput(_p4, 1);

    Pattern patterns[] = {p3, p2, p1, p4};

    // Train the network
    _Cilk_spawn train_network(patterns, 4, atoi(argv[1]), nn);

    printf("\n\nTesting the network\n");

    _Cilk_sync;

    _Cilk_spawn update_pattern(p2, nn);
    for (int i=0; i < nn.n_outputs; i++) {
        printf("Output: %f, expected: %i\n", nn.out_output[i], p2.result[i]);
        printf("NN Error : %f\n", 1.0f - nn.out_output[i]);
    }
    cudaDeviceReset();

    _Cilk_sync;

    clock_t end = clock();
    double time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
    printf("Runtime : %f\n", time_spent);

    return 0;

}

The problem is when I try to compile all these together with nvcc:

$ nvcc -Wno-deprecated-gpu-targets -o my_nn_cilk nn.cu parallel.cu -lm
nn.cu(241): error: identifier "_Cilk_spawn" is undefined

nn.cu(241): error: expected a ")"

nn.cu(245): error: identifier "_Cilk_sync" is undefined

nn.cu(247): error: identifier "_Cilk_spawn" is undefined

nn.cu(247): error: expected a ")"

5 errors detected in the compilation of "/tmp/tmpxft_00003b52_00000000-14_nn.cpp1.ii".

The two functions that I _Cilk_spawn call the desired CUDA kernels. Even if I add to the nvcc command the parameter -lcilkrts, the errors are the same. Also, I have #include "cilk/cilk.h" at the beginning of the code.

Can you please help me? Why does it show these errors and does it not compile? Thank you in advance!

1

There are 1 best solutions below

0
parallel highway On

The reason it does not compile is, nvcc does not support cilk implementations and keywords. You need a wrapper that calls the CUDA functions from your cilk code. Here is a sample on how to write a wrapper and call it from your cilk code: cilk with cuda sample.

In the link, it also explains how to compile the cuda code and cilk code and how to link them.