Is it possible for me to load let's say a 2048 bit number into 8 AVX ymm registers, and shift bits left and right between all of these?
I only need to shift 1 bit at a time.
I've tried finding accurate info on AVX but the interaction between xmm/ymm/zmm and the carry bit seems unclear a lot of the time.
That's the simple part: there is no interaction. SSE/AVX arithmetic does not involve the flags. There are some specific instructions that compare/test vectors (
ptest
) or scalars in vectors (comiss
etc) and then set flags, but they're not that useful here.One approach is start at the top of your number instead of the bottom, load two slightly-offset (mostly overlapping, so that one of the vectors is offset by one element compared to the other) vectors, and use one of the "concatenate and shift" instructions (eg
vpshld
) to do a left-shift that shifts in bits from the previous element (in general it's not from the previous element, it's from another vector, but this is why we loaded a second vector at a one-element offset) instead of zeroes. In AVX2 you can emulate this with left-shift, right-shift, andvpor
.