Do 64bit atomic operations work in openCL on AMD cards?

557 Views Asked by At

The implementation of emulated atomics in openCL following the STREAM blog works nicely for atomic add in 32bit, on CPU as well as NVIDIA and AMD GPUs.

The 64bit equivalent based on the cl_khr_int64_base_atomics extension seems to run properly on (pocl and intel) CPU as well as NVIDIA openCL drivers.

I fail to make 64bit work on AMD GPU cards though -- both on amdgpu-pro and rocm (3.5.0) environments, running on a Radeon VII and a Radeon Instinct MI50, respectively.

The implementation goes as follows:

inline void atomicAdd(volatile __global double *addr, double val)
{
    union {
        long u64;
        double f64;
    } next, expected, current;
    current.f64 = *addr;
    do {
        expected.f64 = current.f64;
        next.f64 = expected.f64 + val;
        current.u64 = atomic_cmpxchg(
            (volatile __global long *)addr,
            (long) expected.u64,
            (long) next.u64);
    } while( current.u64 != expected.u64 );
}

In absence of support for atomic operations for double types, the idea is to exploit casting to long as the values just need to be stored (no arithmetics needed). Then one should be able to use long atom_cmpxchg(__global long *p, long cmp, long val) as defined in the khronos manual for int64 base atomics.

The error I receive for both AMD environments points to falling back to 32bit versions, the compiler seems not to recognise the 64bit versions despite the #pragma:


/tmp/comgr-0bdbdc/input/CompileSource:21:17: error: call to 'atomic_cmpxchg' is ambiguous
                current.u64 = atomic_cmpxchg(
                              ^~~~~~~~~~~~~~
[...]/centos_pipeline_job_3.5/rocm-rel-3.5/rocm-3.5-30-20200528/7.5/out/centos-7/7/build/amd_comgr/<stdin>:13468:12: note: candidate function
int __ovld atomic_cmpxchg(volatile __global int *p, int cmp, int val);
           ^
[...]/centos_pipeline_job_3.5/rocm-rel-3.5/rocm-3.5-30-20200528/7.5/out/centos-7/7/build/amd_comgr/<stdin>:13469:21: note: candidate function
unsigned int __ovld atomic_cmpxchg(volatile __global unsigned int *p, unsigned int cmp, unsigned int val);
                    ^
1 error generated.
Error: Failed to compile opencl source (from CL or HIP source to LLVM IR).

I do find the support for cl_khr_int64_base_atomics in both environments on the clinfo extension list though.. Also cl_khr_int64_base is present in the opencl driver binary file.

Does anybody have an idea what might be going wrong here? Using the same implementation for 32bit (int and float instead of long and double) works flawlessly for me...

Thanks for any hints.

1

There are 1 best solutions below

0
On BEST ANSWER

For 64-bit, the function is called atom_cmpxchg and not atomic_cmpxchg.