Can I substitute a MOV operation with an OR operation?

330 Views Asked by At

First of all, I'd like to say that I'm new to ASM and if this is a stupid question please excuse it.

I read in Agner Fog's microarchitecture manual about partial register stalls (this seems a little bit advanced, but I was curious why 32-bit instructions in 64-bit mode zero the top half of the register). Example 6.13 gives a solution for how to avoid a register stall. I am a bit confused about this still, why was an OR operation not used instead of the MOV, like such:

xor eax, eax
mov al, byte [mem8]
; or  al, byte [mem8] ; why not this?

I think the effect is the same. Do they both take the same amount of cycles per second? Is one more efficient than the other? Is there something "under the hood" that would make me prefer one over the other?

1

There are 1 best solutions below

1
On

Partial register access in 64-bit mode

In 64-bit mode, the following rules apply when accessing registers with less than 64-bit:

  • If a 32-bit register is accessed, the upper 32 bits of the associated 64-bit register are cleared
  • If a 16- or 8-bit register is accessed, the upper 48 or 56 bits of the associated 64-bit register remain.

If only an 8-bit register is accessed, the old value of the associated 64-bit register must first be obtained, the 8-bit sub-register changed and then the new value saved.

Example 6.13 from Agner Fog's microarchitecture manual is not related to this, it is only an alternative to movzx, because this instruction is slow on older pentium processors.

mov or or?

The two lines

31 C0                   xor eax, eax
8A 05 ## ## ## ##       mov al, byte [mem8]

(opcodes on the left) are probably faster than if you replaced the second line with

0A 05 ## ## ## ##       or  al, byte [mem8]

since there is a depency to the previous line: Only when xor eax, eax has been calculated the new value in eax can be passed on to or. In addition, just as with the variant with mov, there may be a slowdown because only a partial register is accessed. Instead, I would suggest replacing these two lines with

0F B6 05 ## ## ## ##    movzx eax, byte [mem8]

This is one byte shorter than the previous approach and also just a single instruction that accesses a full 32-bit register. As Agner Fog said

The easiest way to avoid partial register stalls is to always use full registers and use MOVZX or MOVSX when reading from smaller memory operands.