In SSE the prefixes 066h
(operand size override) 0F2H
(REPNE) and 0F3h
(REPE) are part of the opcode.
In non-SSE 066h
switches between 32-bit (or 64-bit) and 16-bit operation. 0F2h
and 0F3h
are used for string operations. They can be combined so that 066h
and 0F2h
(or 0F3h
) can be used in the same instruction, because this is meaningful. What is the behavior in an SSE instruction? For instance, we have (ignoring mod/rm for now):
0f 58 addps
66 0f 58 addpd
f2 0f 58 addsd
f3 0f 58 addss
But what is this?
66 f2 0f 58
And how about?
f2 66 0f 58
Not to mention the following which has two conflicting REP prefixes:
f2 f3 0f 58
What is the spec for these?
I do not remember having seen any specification on what you should expect in the case of wildly combining random prefixes, so I guess CPU behaviour may be "undefined" and possibly CPU-specific. (Clearly, some things are specified in e.g. Intel's docs, but many cases aren't covered). And some combinations may be reserved for future use.
My naive assumptions would generally have been that additional prefixes would be no-ops but there's no guarantee. That seems reasonable given that e.g. some optimising manuals recommend multi-byte
NOP
(canonically90h
) by prefixing with66h
, e.g.:However, I also know that
CS
andDS
segment override prefixes have aquired novel functions as SSE2 branch hint prefixes (predict branch taken =3Eh
=DS
override; predict branch not taken =2Eh
=CS
override) when applied to conditional jump instructions.Anyway, I looked at your examples above, always setting
XMM1
to all0
andXMM7
to all0FFh
byand then the code in question, with
xmm1, xmm7
arguments. What I observed (32bit code on Win64 system and Intel T7300 Core 2 Duo) was:1) no change observed for
addsd
by adding66h
prefix2) no change observed for
addss
by adding0F2h
prefix3) However, I observed a change by prefixing
addpd
by0F2h
:In this case, the result in XMM1 was
0000000000000000FFFFFFFFFFFFFFFFh
instead ofFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFh
.So my conclusion is that one shouldn't make any assumptions and expect "undefined" behaviour. I wouldn't be surprised, however, if you could find some clues in Agner fog's manuals.