How to set to 1
efficiently with AVX2
- first
N
bits - last
N
bits
of __m256i
, setting the rest to 0
?
These are 2 separate operations for tail and head of a bit range, when the range may start and end in the middle of __m256i
value. The part of the range occupying full __m256i
values is processed with all-0
or all-1
masks.
The AVX2 shift instructions
vpsllvd
andvpsrlvd
have the nice property that shift counts greater than or equal to 32 lead to zero integers within the ymm register. In other words: the shift counts are not masked, in contrast to the shift counts for the x86 scalar shift instructions.Therefore the code is fairly simple:
The results are:
For a value
n
, with 256<=n
<=65535, all bits are set to one, as one might expect. The upper limit of 65535 is due to the 16-bit saturated arithmetic of_mm256_subs_epu16()
. Withn
=65536 the bitmask (the output value) is zero. It is possible to modify the code such that all bits are set to one for the range of 256<=n
<=INT_MAX
. This can be achieved by replacingshift = _mm256_subs_epu16(cnst32_256,shift);
withThese three intrinsics more or less emulate
_mm256_subs_epu32(cnst32_256,shift)
, which doesn't exist.