This is a question about the operand-size override prefixes in the x86-64 (AMD64) architecture.
Here is a bunch of assembler instructions (nasm) and their encodings; by new I mean the r8, ..., r15 registers:
67: address-size override prefix
|
| 4x: operand-size override prefix
| |
; Assembler ; | Dst operand | Src operand | -- --
mov eax,ecx ; | 32-bit | 32-bit | 89 C8 |
mov r8d,ecx ; | 32-bit new | 32-bit | 41 89 C8 |
mov eax,r9d ; | 32-bit | 32-bit new | 44 89 C8 |
mov r8d,r9d ; | 32-bit new | 32-bit new | 45 89 C8 |
mov rax,rcx ; | 64-bit | 64-bit | 48 89 C8 |
mov r8,rcx ; | 64-bit new | 64-bit | 49 89 C8 |
mov rax,r9 ; | 64-bit | 64-bit new | 4C 89 C8 |
mov r8,r9 ; | 64-bit new | 64-bit new | 4D 89 C8 |
lea eax,[ecx] ; | 32-bit | 32-bit | 67 8D 01 |
lea r8d,[ecx] ; | 32-bit new | 32-bit | 67 44 8D 01 |
lea eax,[r9d] ; | 32-bit | 32-bit new | 67 41 8D 01 |
lea r8d,[r9d] ; | 32-bit new | 32-bit new | 67 45 8D 01 |
lea rax,[rcx] ; | 64-bit | 64-bit | 48 8D 01 |
lea r8,[rcx] ; | 64-bit new | 64-bit | 4C 8D 01 |
lea rax,[r9] ; | 64-bit | 64-bit new | 49 8D 01 |
lea r8,[r9] ; | 64-bit new | 64-bit new | 4D 8D 01 |
push rax ; | | 64-bit | 50 |
push r8 ; | | 64-bit new | 41 50 |
From studying these and the same instructions with other registers, I deduce the following. There is a pairing between ‘old’ and ‘new’ registers. Non-exhaustively:
AX <--> R8
CX <--> R9
DX <--> R10
BX <--> R11
BP <--> R13
Ignoring the size prefix, the instruction bytes do not refer to particular registers, but to pairs of registers. As an example: the bytes 89 C8 indicate a mov instruction from a source which is either ecx, rcx, r9d, or r9, to a destination which is either eax, rax, r8d, or r8. Given that the operands must be both 32- or 64-bits wide, there are eight legal possible combinations. The operand-size override prefix (or absence thereof) indicates which of those combinations is the intended one. For instance if the prefix is present and is 44, then the source operand must be a 32-bit new register (in this example then collapsing to r9d) and the destination must be a 32-bit old register (here then signalling eax).
I may not have got it totally right, but I think I get the gist of it. It would appear then that what the operand-size override prefixes do override is the fact that without them the instruction would use 32-bit ‘old’ operands.
But for sure, there is something that escapes me, otherwise: what sense then does it make to talk about “a version of x86-64 with a default operand-size of 64-bit” (like here)?
Or is there a way, running on a 64-bit machine, to set the default operand size to either 32 or 64, and if so, and if my program set the machine appropriately, I would see different encodings?
Also: when would the 66H operand-size override prefix be used?
Yes in 64-bit machine code, the default operand-size is 32-bit for most instructions, 64-bit for stack and jump/call instructions, and also 64-bit for
loop
andjrcxz
. (And the default address-size is 64-bit, soadd eax, [rdi]
is a 2-byte instruction, no prefixes.) And no, the defaults are not changeable, you can't have 2-byteadd rax, rdx
.Operand-size encoding coding in 64-bit mode
0x4?
with the high bit set in the low nibble, 48..4f). A REX prefix with the W bit cleared can never override the operand-size to 32-bit for opcodes where it defaults to something else. (Likepush
)0x66
prefix, likeimul ax, [r8], 123
(In other modes, there is no REX, and
66
sets it to whatever the non-default is.)Fun fact:
loop
andjrcxz
are overridden to use ECX instead of RCX implicitly by an address-size prefix, not operand-size. IIRC, this makes some sense because the operand-size attribute of a branch affects whether it truncates EIP to IP or not.For example, GNU .intel_syntax disassembly of those NASM-syntax examples from above.
Note the imul example used a "high" register so it needed a REX prefix to signal R8, separate from needing a 66 prefix to signal 16-bit operand-size. The .W bit is not set in the rex prefix, it's
0x41
not0x49
.It doesn't make sense to have both REX.W and a
0x66
prefix. It seems that the REX.W prefix "wins" in that case. Single-stepping66 48 05 40 e2 01 00 data16 add rax,0x1e240
in Linux GDB on an i7-6700k (Skylake), the single-step leaves RIP pointing to the end of that whole instruction (and adding the full immediate to RAX), not decoding it asadd ax, 0xe240
and leaving RIP pointing into the middle of the 4-byte immediate. (A66
prefix is length-changing for that opcode, like most that have a 32-bit immediate which becomes 16-bit. See https://agner.org/optimize/ re: LCP stalls.)I got NASM to emit that from
o16 add rax, 123456
. REX prefixes in general are normal and fine with a66
prefix, e.g. to encodeadd r8w, [r15 + r12*4]
, needing all 3 other bits to be set in the REX's low nibble.0x67
prefix, likeadd eax, [edx]
.It can of course be combined with operand-size stuff, totally orthogonal.
Normally 32-bit address size is only useful for the Linux x32 ABI (ILP32 in long mode to save cache footprint on pointer-heavy data structures) where you may want to truncate high garbage from a pointer to make sure address math correctly wraps to stay in the low 4GiB, even with 32-bit negative numbers.
In other modes,
67
sets address size to the non-default. 16-bit address-size also implies 16-bit interpretation of the ModRM byte, so only[bx|bp + si|di]
are allowed, no SIB byte to allow the flexibility of 32 / 64-bit addressing.Modes and sets of defaults
No, the defaults can't be changed in 64-bit mode. Different bits in the GDT entry selected by CS (or any other method) won't matter. AFAIK, the table in https://en.wikipedia.org/wiki/X86-64#Operating_modes is a complete list of the possible combinations of modes and default operand/address sizes.
There's only one set of settings that allows 64-bit operand-size at all. It's not possible even in any legacy mode to have a combo like 16-bit operand, 32-bit address size.
This makes some sense from a hardware-complexity perspective. The more different combos of things it needs to support, the more transistors might be involved in an already complex and power-intensive part of the CPU.
(Although the default stack address size used implicitly by push/pop is selected independently by the SS selector, IIRC. So I think you can have normal 32-bit mode where
add eax, [edx]
is 2 bytes, except with push/pop/call/ret usingss:sp
instead ofss:esp
. Not something I've ever tried setting up.)Note that 16-bit AX corresponds to 16-bit R8W, while RAX and R8 are the pair distinguished by a REX prefix.
In assembly source, there's no default, it must be implied by a register or specified explicitly.
Except for some assemblers having a default for push/pop, or a few bad assemblers that have a default for other cases, including the GNU assembler for things like
add $1, (%rdi)
defaulting to dword, with a warning only in recent versions. GAS does error on ambiguousmov
, strangely. clang's built-in assembler is better, erroring on any ambiguous operand-size.