If I have a quad word named "a", this instruction: MOV RAX, a + 3 will offset the address of a by 3.
When i do the same thing with a constant (a EQU 10), MOV RAX, a + 3 will move into rax in binary the number 13.
Why does the cpu do that if it is the same "+" sign and the same MOV instruction ?
I wrote this code and while debugging I found out this strange thing. I just can t understand why and I can't find the answer online.
BTW I'm using MASM
The CPU runs machine code, not asm source.
In MASM the same instruction syntax means different things in depending on whether it's a "variable" (symbol defined by a label) or an assemble-time constant (
equora = 10).Other assemblers (like NASM) don't have that inconsistency, so
mov rax, a+3would always be amov rax, imm64of an immediate constant, whether that's an address (link-time constant) or an assemble-timeequconstant. (In MASM, that would bemov rax, OFFSET a+3).(NASM would actually optimize it to
mov eax, 13, not 10-bytemov r64,imm4, but still a mov-immediate either way.)The inconsistency isn't in
+, it always does addition.When
ais a symbol / label attached to adq,a+3is that symbol address plus three.When
ais an assemble-time constant defined withequora = 10,a+3is10+3.A symbol's "value" is its address, like if you declared it in C as
extern char a[]. In MASM, if you diddq a+3, it would basically work the same whether it's a label or equ constant: add 3 to a link-time constant (address) or assemble-time constant (equ), and assemble those 8 bytes into the output file at the current position.The inconsistency is in how operands to asm instructions work in MASM: see Confusing brackets in MASM32
mov rax, [a]or[a+3]is a load from the address given by the expressionaora+3, ifais a label.awas an assemble-time constant, the instruction is a mov-immediate.If you wanted to add to data being loaded from memory, like C
uint64_t tmp = a + 3;whereais a global variable, you'd have tomov rax, [a]/add rax, 3.But if
ais a compile-time constant like C++static const uint64_t a = 10;or#define a 10, then the compiler can do the addition at compile time, likemov eax, 10+3which assembles the same asmov eax, 13.The only x86 instruction that can load something from memory and add to the load result is
add reg, [mem]. Likemov eax, 3/add rax, [a]. (But that's less efficient than load+add reg,imm: larger code size and more back-end uops.)(Intel's proposed APX (Advanced Performance Extensions) will introduce EVEX encodings of integer instructions, making
add rax, [a], 3possible, with a register destination separate from the two sources.)Any normal memory operand can use an addressing mode which can involve some address math, but address math is separate from math on data operands. Just like you can't do
mov rax, rcx + 3, you need anaddinstruction to do math on values. Or an LEA to copy-and-add.) A different part of the CPU (the load/store execution units) handle address math like[rdx + rcx*8 + 3], and it gets encoded differently in the machine code.Perhaps that's what you're thinking of as an inconsistency, if you're thinking of
a dq 10as givingathe value10the same waya equ 10does. It doesn't, it puts those bytes in data memory. That's similar but different; the10isn't an assemble-time constant so you can use its value in expressions, and it's only accessible with load/store instructions.BTW,
mov rax, a+3doesn't involve any run-time address math, at least not for the+3. The linker resolvesa+3to a RIP-relative addressing mode just like witha, but offset by 3. So for example it might be[rip + 1013h]foravs.[rip + 1016h]fora+3.PS: I mentioned NASM a few times as a point of comparison. See also:
a dq 10in a .data section, and it actually implies an operand-size when you use it likeadd a, 1. Other assemblers don't have that high-level concept, they just have labels/symbols you can put before or after data, which you can use to implement static variables.