I have a program that consists of 3 files, one .c file and two .cu files, nn.cu and parallel.cu. The main function is located in the one .cu file, the nn.cu and the .c file (utils.c) I have it as extern "C" in the parallel.cu . I want to further parallelize the program (which runs perfectly without cilk), so I considered cilk, with _Cilk_spawn and _Cilk_sync:
int main(int argc, char* argv[] ) {
clock_t begin = clock();
srand((unsigned)time(NULL));
int n_inputs = atoi(argv[2]);
int n_hidden = atoi(argv[3]);
int n_outputs = atoi(argv[4]);
// Build output layer
NeuralNet nn = buildNeuralNet(n_inputs, n_outputs, n_hidden);
// Build training samples
int _p1[] = {0,0};
Pattern p1 = makePatternSingleOutput(_p1, 0);
int _p2[] = {0,1};
Pattern p2 = makePatternSingleOutput(_p2, 1);
int _p3[] = {1,1};
Pattern p3 = makePatternSingleOutput(_p3, 1);
int _p4[] = {1,0};
Pattern p4 = makePatternSingleOutput(_p4, 1);
Pattern patterns[] = {p3, p2, p1, p4};
// Train the network
_Cilk_spawn train_network(patterns, 4, atoi(argv[1]), nn);
printf("\n\nTesting the network\n");
_Cilk_sync;
_Cilk_spawn update_pattern(p2, nn);
for (int i=0; i < nn.n_outputs; i++) {
printf("Output: %f, expected: %i\n", nn.out_output[i], p2.result[i]);
printf("NN Error : %f\n", 1.0f - nn.out_output[i]);
}
cudaDeviceReset();
_Cilk_sync;
clock_t end = clock();
double time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
printf("Runtime : %f\n", time_spent);
return 0;
}
The problem is when I try to compile all these together with nvcc:
$ nvcc -Wno-deprecated-gpu-targets -o my_nn_cilk nn.cu parallel.cu -lm
nn.cu(241): error: identifier "_Cilk_spawn" is undefined
nn.cu(241): error: expected a ")"
nn.cu(245): error: identifier "_Cilk_sync" is undefined
nn.cu(247): error: identifier "_Cilk_spawn" is undefined
nn.cu(247): error: expected a ")"
5 errors detected in the compilation of "/tmp/tmpxft_00003b52_00000000-14_nn.cpp1.ii".
The two functions that I _Cilk_spawn call the desired CUDA kernels.
Even if I add to the nvcc command the parameter -lcilkrts, the errors are the same.
Also, I have #include "cilk/cilk.h" at the beginning of the code.
Can you please help me? Why does it show these errors and does it not compile? Thank you in advance!
The reason it does not compile is, nvcc does not support cilk implementations and keywords. You need a wrapper that calls the CUDA functions from your cilk code. Here is a sample on how to write a wrapper and call it from your cilk code: cilk with cuda sample.
In the link, it also explains how to compile the cuda code and cilk code and how to link them.