I am currently trying to implement a simple matrix multiplication of 2 nxn
matrices using OpenMP target offloading. The code is taken from here:
template<typename T>
void multiplyJIK(T *A, T *B, T *C, uint64_t size) {
#pragma omp target data device(0) map(to: A[0:size*size], B[0:size * size], size) map(tofrom: C[0:size * size])
{
#pragma omp target teams device(0) num_teams(32768) thread_limit(512) \
map(to: A[0:size*size], B[0:size * size], size) map(tofrom: C[0:size * size]) \
default(none) shared(A, B, C, size)
#pragma omp distribute parallel for num_threads(512) dist_schedule(static, 512) \
default(none) shared(A, B, C, size)
for (uint64_t j = 0; j < size; ++j) {
for (uint64_t i = 0; i < size; ++i) {
for (uint64_t k = 0; k < size; ++k) {
C[i * size + j] += A[i * size + k] * B[k * size + j];
}
}
}
}
}
It should multiply the 2 matrices A
and B
and store the results in C
. The matrices are represented as onedimensional arrays of length size * size
.
For my test, T
is a float
and I try to compile the code using the nvhpc toolkit: nvc++ -std=c++17 -mp=gpu -target=gpu main.cpp -o matmul
and get this error:
error: item must appear in a SHARED or PRIVATE clause:
C[i * size + j] += A[i * size + k] * B[k * size + j];
^
detected during instantiation of "void Target::multiplyJIK(T *, T *, T *, uint64_t) [with T=float]"
I dont understand this error as the C array should be correctly mapped (map(tofrom: C...)
) and is present in the shared(...)
clause. Am I missing something in the code or is this a problem with the compile flags?