I have this AVX2 code which I'm trying to extend to AVX-512:
_mm256_permute4x64_epi64(a, _MM_SHUFFLE(3, 1, 2, 0));
The extended code would look like this:
_mm512_permute8x64_epi64(a, _MM_SHUFFLE(7, 5, 3, 1, 6, 4, 2, 0));
But this intrinsic doesn't exist.
The intent is to take the low 64-bit part of each 128-bit lane and pack them together in the bottom 256-bit part of the register.
How can I extend the original code to AVX-512?