Why do parentheses do different things based on context in AT&T syntax?

106 Views Asked by At

I'm taking a class that deals with assembly. A few friends and I are debating what the difference between %rdi and (%rdi) in the following contexts:

Let's say RDI holds the character value 'w' in ASCII form. To the best of my knowledge, if we do %rdi, we will be dealing with the value 'w' itself, whereas if we do (%rdi), we will be dealing with the value pointed to by 'w' (or more specifically, the value at the address numbered the ASCII code of 'w', i.e. absolute address 0x0000000000000077).

Similarly, if RDI holds the value 0x1007bf, I believe %rdi gives us the numerical value 0x1007bf itself, while (%rdi) gives us the value at the address 0x1007bf...

What is the correct interpretation?

Note: all of these are in the context of the mov and cmp functions

2

There are 2 best solutions below

0
On

You're essentially correct, that %rdi is the register direct addressing mode, while (%rdi) is register indirect, though let's note two things:

  1. Source vs. Target: Both forms, %rdi and (%rdi) can appear in the source or target position (or source/target), so that means read vs. write (vs. read followed by write).  (So what you've said is accurate for reading.)

  2. %rdi alone specifies a 64-bit operand/operation because that is a 64-bit register.  Whereas (%rdi) specifies an effective address but it does not specify an operand size (it specifies using the 64-bit register for dereference, e.g. contrast with (%edi) which says to use 32-bit register for dereferencing) but the size of the item located at that memory address is unspecified by this alone and we need to know the larger context (the opcode or the other operand) to understand the size of the memory access, whether byte, word, dword, or qword.

0
On

AT&T syntax distinguishes four addressing modes:

  1. register operands where the operand is a register. The syntax is to give just the register name. An example of such an operand is %rax.

  2. immediate operands where the operand is an immediate value. The syntax is $ followed by an expression. An example is $42, but also $foo, $'w', $(42), $$foo (the value being that of symbol $foo), and $$foo+$$bar (the value being that of symbol $foo plus $$bar).

  3. memory operands where the operand references a location in memory. The syntax is

    memory-operand = [ register : ] [ expression ] ( index )
      | [ register : ] expression
    index = register
      | register , register
      | [ register ] [ , [ register ]] , expression

    i.e. a memory operand optionally begins with a segment-selector such as %ds:, then an optional displacement, and the an optional index. Displacement and index cannot be omitted at the same time. The index is either a register, a pair of registers, or a scale-index-base type index where base and index (possibly including the second comma) can be omitted. If neither base nor index are given, the addressing mode is absolute (except for jumps, see below). To get RIP-relative addressing, use (%rip) for the index.

    Examples for memory operands are foo, (%rax) , (foo) (where foo is the displacement as an expression), ($foo) (taking the value of symbol $foo as the displacement), %fs:(%rax, %rsi, 1), 23, and 1234(%rsp).

  4. indirect operands where the operand references a location in memory indirectly. The syntax is * followed by a memory operand. This is used to distinguish direct jumps (which take memory operands with absolute addressing but used as if relative addressing was used) from indirect jumps by analogy to PDP-11 assembly. Examples are *foo to jump to the address stored in variable foo as opposed to jumping to foo.

    Note that far direct jumps are encoded as taking immediate operands for historical reasons.

I hope this clears things up. Note especially how parentheses are used both for grouping within expressions and to denote indices. There is no syntactical ambiguity as indices must contain registers or contain commas, neither of which can appear in expressions.