First of all, I'd like to say that I'm new to ASM and if this is a stupid question please excuse it.
I read in Agner Fog's microarchitecture manual about partial register stalls (this seems a little bit advanced, but I was curious why 32-bit instructions in 64-bit mode zero the top half of the register). Example 6.13 gives a solution for how to avoid a register stall. I am a bit confused about this still, why was an OR operation not used instead of the MOV, like such:
xor eax, eax
mov al, byte [mem8]
; or al, byte [mem8] ; why not this?
I think the effect is the same. Do they both take the same amount of cycles per second? Is one more efficient than the other? Is there something "under the hood" that would make me prefer one over the other?
Partial register access in 64-bit mode
In 64-bit mode, the following rules apply when accessing registers with less than 64-bit:
If only an 8-bit register is accessed, the old value of the associated 64-bit register must first be obtained, the 8-bit sub-register changed and then the new value saved.
Example 6.13 from Agner Fog's microarchitecture manual is not related to this, it is only an alternative to
movzx
, because this instruction is slow on older pentium processors.mov
oror
?The two lines
(opcodes on the left) are probably faster than if you replaced the second line with
since there is a depency to the previous line: Only when
xor eax, eax
has been calculated the new value ineax
can be passed on toor
. In addition, just as with the variant withmov
, there may be a slowdown because only a partial register is accessed. Instead, I would suggest replacing these two lines withThis is one byte shorter than the previous approach and also just a single instruction that accesses a full 32-bit register. As Agner Fog said