AMD HCC Swizzle Intrinsic

834 Views Asked by At

I've just recently discovered AMD's equivalent to CUDA's __byte_perm intrinsic; amdgcn_ds_swizzle(Or at least I think its the equivalent of a byte permutation function). My problem is this: CUDA's byte perm takes in two unsigned 32 bit integers, and then permutes that based on the value of the selector argument (supplied as a hex value). However, AMD's swizzle function only takes in one single unsigned 32 bit integer, and one int that's named as "pattern". How do I utilize AMD's Swizzle intrinsic function?

1

There are 1 best solutions below

2
On BEST ANSWER

ds_swizzle and __byte_perm do are a little bit different. One permutes a whole register across lanes and the later permutes any four bytes from two 32-bit regs.

AMD's ds_swizzle_b32 GCN instruction is actually swapping values with other lanes. You specify the 32-bit register in the lane you want to read and the 32-bit register you want to place it in. There is also a hard-coded value that specifies how these are to be swapped. A great explanation of ds_swizzle_b32 is here as user3528438 pointed out.

The __byte_perm does not swap data with other lanes. It just gathers any 4 bytes from two 32-bit registers in its own lane and stores it to a register. There is no cross-lane traffic.

I'm guessing the next question would be how to do a "byte permute" on AMD GCN hardware. The instruction for that is v_perm_b32. (see page 12-152 here) It basically selects any four bytes from two specified 32-bit registers.