Within the intel intrinsics guide, the pseudocode for the operation of _mm_insert_ps, the following is defined:
FOR j := 0 to 3
i := j*32
IF imm8[j%8]
dst[i+31:i] := 0
ELSE
dst[i+31:i] := tmp2[i+31:i]
FI
ENDFOR
. The access into imm8 confuses me: IF imm8[j%8]. As j is within the range 0..3, the modulo 8 part doesn't seem to do anything. Does this maybe signal a convertion that I am not aware of? Or is % not "modulo" in this case?
Seems like a pointless modulo.
Intel's documentation for the corresponding asm instruction,
insertps, doesn't use any%modulo operations in the pseudocode. It usesZMASK ←imm8[3:0]and then basically unrolls that part of the pseudocode where this uses a loop, with checks likeThis is just showing how the low 4 bits of the immediate perform zero-masking on the 4 dword elements of the final result, after the insert of an element from another vector, or a scalar in memory.
(There's no intrinsic for insert directly from memory; you'd need an intrinsic for
movssand then hope the compiler folds that load into a memory operand forinsertps. With a memory source,imm8[7:6]are ignored, just taking that scalar dword as the element to insert (that's theELSE COUNT_S←0in the asm pseudocode), but then everything else works the same, including the zero-masking you're asking about.)