Why is x86 MOV two bytes, not one? How does the opcode and machine code work?

719 Views Asked by At

I'm having trouble understanding a very basic x86 instruction. The instruction is

0x080491d7 <+1>:     mov    %esp,%ebp

I know that it moves the value of esp into ebp. But I'm trying to understand the opcodes. The instruction is 2 bytes long, not 1 which I'm confused about. I would've thought it would only be 1 byte.

The memory for this instruction is:

0x80491d7 <main+1>:     0x89    0xe5

I know that 0x89 is one of the opcodes for MOV. I've been reading the Intel manuals. I don't know what 0xe5 is for. Is it like a suffix or another opcode value or something else? The Intel manual is a little confusing.

The c program is compiled for x86 32 bit and the Linux server is x86_64.

2

There are 2 best solutions below

0
On

I know that 0x89 is one of the opcodes for MOV. I've been reading the intel manuals. I don't know what 0xe5 is for. Is it like a suffix or another opcode value or something else? The intel manual is a little confusing.

You found that the mov %esp, %ebp instruction got encoded with 2 bytes: 0x89, 0xE5.

Consulting the Intel manuals is the right thing to do, but I would advice to look at your instruction using the proper Intel syntax mov ebp, esp. It might save you from an inadvertent error interpreting the opcode tables.

Looking up 89h in the one-byte opcode table, you see in the table mentioned "Ev, Gv".

The "Using opcode tables" chapter explains what these character combinations mean.

E --- A ModR/M byte follows the opcode and specifies the operand.
v --- Word or doubleword, depending on operand-size attribute.
, --- Litteraly a separating comma.
G --- The reg-field within the ModR/M byte selects a general purpose register.

So that second byte is a ModR/M byte.

Your ModR/M byte is E5h or 11'100'101b in binary notation following the grouping 'mod-reg-r/m'.

  • Because of the mention "Gv", the reg-field (100b) refers to a (d)word-sized general purpose register. It could be referring to SP, or ESP.
  • Because the 2 most significant bits (11b) are set in the ModR/M byte, the 3 least significant bits (101b) refer to a register. And because of the mention "Ev", it could be referring to BP, or EBP.

Which registers? For that we look at the opcode 89h or 100010'0'1b in binary notation following the grouping 'TTTTTT-d-w'.

Bit 0 (w) tells us this is a (d)word-sized operation (which accords with the mention "v" above). Since this is 32-bit code and no operand size prefix (0x66) was used, what remains is ESP/EBP.

Bit 1 (d) tells us which of these operands is the source or the destination (which accords with the mention "E,G" above). Since this bit is 0, the reg field (ESP) indicates the source and the r/m field (EBP) indicates the destination. With a set d-bit it would be the other way round, meaning the bytes 0x8B, 0xEC would also be a perfect encoding for your instruction mov %esp, %ebp.

0
On

The instruction is 2 bytes long, not 1 which I'm confused about.

Yes, looking into the description of the mov instruction in the Intel Developer Manual volume 2 one may see that encoding is 8B /r, which, according to the chapter 3.1.1.1 "Opcode Column in the Instruction Summary Table" has the following meaning: /r — Indicates that the ModR/M byte of the instruction contains a register operand and an r/m operand. So the second byte is the ModR/M byte. Its meaning can be found in the Table 2-2 "32-Bit Addressing Forms with the ModR/M Byte".