I have two packed quadword integers in xmm0
and I need to add them together and store the result in a memory location. I can guarantee that the value of the each integer is less than 2^15. Right now, I'm doing the following:
int temp;
....
movdq2q mm0, xmm0
psrldq xmm0, 8
movdq2q mm1, xmm0
paddq mm0,mm1
movd temp, mm0
Is there a better way to do this?
First off, why are you using quadwords to represent values that would fit in a 16-bit format? Leaving that aside, a couple solutions:
or
or
Note that you don't actually need to use
paddq
, you can get away with one of the narrower adds if you prefer.edit summing four double quadwords -- what you have is pretty much fine. Given that you know that all the data in them fits into the low doubleword of each slot, you could try something like:
which may or may not prove to be faster.
As for EMMS, it's just another instruction. After any code that touches the MMX registers, before any code that uses the x87 floating-point instructions you need to have
emms
.