What is the correct way to support `shfl()` and `shfl_sync()` instructions?

1.1k Views Asked by Blizzard At 29 April 2019 at 21:57

From my understanding, CUDA 10.1 removed the shfl instructions:

PTX ISA version 6.4 removes the following features:

Support for shfl and vote instructions without the .sync qualifier has been removed for .targetsm_70 and higher. This support was deprecated since PTX ISA version 6.0 as documented in PTX ISA version 6.2.

What is the correct way to support shfl future and past CUDA versions?

My current methods (shared below) result in the error using CUDA 10.1:

ptxas ... line 466727; error   : Instruction 'shfl' without '.sync' is not supported on .target sm_70 and higher from PTX ISA version 6.4

template <typename T>
__device__ static __forceinline__
T _shfl_up(T var, unsigned int delta, int width=WARPSIZE, unsigned mask=MEMBERMASK)
{
#if (__CUDACC_VER_MAJOR__ >= 9)
  var = __shfl_up_sync(mask, var, delta, width);
#else
  var = __shfl_up(var, delta, width);
#endif
  return var;
}

Also, I would like to add that one of the dependencies of my project is CUB and I believe they utilize the same method to split up _sync() and older shfl instructions. I am not sure what I am doing wrong.

Original Q&A

There are 1 best solutions below

Blizzard On 29 April 2019 at 22:47 BEST ANSWER

I was doing the right thing, turns out another dependency didn't have support for sync, created a pull request for it: https://github.com/moderngpu/moderngpu/pull/32

template <typename T>
__device__ static __forceinline__
T _shfl_up(T var, unsigned int delta, int width=WARPSIZE, unsigned mask=MEMBERMASK)
{
#if ( __CUDA_ARCH__ >= 300)
#if (__CUDACC_VER_MAJOR__ >= 9)
  var = __shfl_up_sync(mask, var, delta, width);
#else
  var = __shfl_up(var, delta, width);
#endif
#endif
  return var;
}

What is the correct way to support `shfl()` and `shfl_sync()` instructions?

There are 1 best solutions below

Related Questions in CUDA

Related Questions in PTX

Related Questions in PTXAS

Trending Questions

Popular # Hahtags

Popular Questions

What is the correct way to support `__shfl()` and `__shfl_sync()` instructions?

There are 1 best solutions below

Related Questions in CUDA

Related Questions in PTX

Related Questions in PTXAS

Trending Questions

Popular # Hahtags

Popular Questions

What is the correct way to support `shfl()` and `shfl_sync()` instructions?