How do callee-saved registers work? Who should push the original value onto the stack?

85 Views Asked by At

I'm learning callee- and caller-saved registers from CSAPP (Third Edition) on pg251, where I am aware that as for callee-saved registers:

Procedure Q (callee) can preserve a register value by either not changing it at all or by pushing the original value on the stack, altering it, and then popping the old value from the stack before returning.

Additionally, I've learned from this post that:

Callee-saved registers (AKA non-volatile registers, or call-preserved) are used to hold long-lived values that should be preserved across calls.

However, I am still confused when trying to understand an example in CSAPP (Third Edition) on pg252. The example is partially stated below:

long P(long x, long y) {
    long u = Q(y); // The detail of Q is irrelevant here
    long v = Q(x);
    return u + v;
}

The corresponding assembly is generated by gcc-11 as below:

_P:
LFB2:
    pushq   %rbp
LCFI2:
    pushq   %rbx
LCFI3:
    subq    $8, %rsp
LCFI4:
    movq    %rdi, %rbp
    movq    %rsi, %rdi
    call    _Q
    movq    %rax, %rbx
    movq    %rbp, %rdi
    call    _Q
    addq    %rbx, %rax
    addq    $8, %rsp
LCFI5:
    popq    %rbx
LCFI6:
    popq    %rbp
LCFI7:
    ret

It is clear that %rbp and %rbx are callee-saved registers, and we could see that procedure P (caller) pushes %rbp and %rbx onto the stack.

  • Does this mean that caller P (instead of callee Q) actually saves the values of the two registers?
  • If so, does it contradict the role of callee-saved registers?

I would appreciate it if you could help me with the questions above. Thank you very much!

2

There are 2 best solutions below

0
tkausl On BEST ANSWER

You have to keep in mind that P is not only a caller but also a callee.

_P:
LFB2:
    pushq   %rbp
LCFI2:
    pushq   %rbx

Here we can see that P saves rbp and rbx to save its callers values.


These two lines are interesting.

call    _Q
addq    %rbx, %rax

We can see that P calls Q and then adds rbx to its result (stored in rax). This implies rbx was not modified by Q which means Q must have preserved its value, i.e. it's callee saved.

1
Erik Eidt On

Does this mean that caller P (instead of callee Q) actually saves the values of the two registers?

These registers are saved by P for the benefit of P's callers.  No function can use call-preserved registers without save & restore (unless this function never returns to its caller).

If so, does it contradict the role of callee-saved registers?

No, it's necessary to follow the rules.


Since we don't see Q, we don't know what call-preserved registers it saves & restores — it only has to save call-preserved registers that it itself chooses to modify.  It might save & restore rbp, or both rbp and rbx, or neither, or even more.

Further, Q may call other functions, which may or may not use call-preserved registers — they are required to also preserve those registers' values.


Let's also note for fun that the above code could have used one less call-preserved register by rearranging the code just slightly.

The code (1) moves the return value/result of the first call to Q from rax into a into call-preserved register rbx (u), followed by (2) moving x (which was moved to rbp) into the parameter register rdi for the second call to Q.

By swapping the order of those two moves, the lifetimes of u and x will not overlap, and so could actually share the same storage location (e.g. a call-preserved register).

We would instead (1) move x (from rbp) into rdi for the second call, then (2) move rax into rbp — forgoing the need for two the call-preserved registers.


In another analysis, for the variables that are live across a call, I would have moved them onto the stack instead of using call-preserved registers.  This is because neither x nor u (the two variables that are live across a call) have any significant usage — each is read only once.

_P:
    subq    $8, %rsp
    movq    %rdi, (%rsp)          # one memory write
    movq    %rsi, %rdi
    call    _Q
    movq    (%rsp), %rdi          # one memory read
    movq    %rax, (%rsp)          # one memory write
    call    _Q
    addq    (%rsp), %rax          # one memory read
    addq    $8, %rsp
    ret

For a total of 10 instructions, with 2 writes and 2 reads to stack memory.

By comparison, the call-preserved register-using version you're showing above uses a total of 14 instructions, also with 2 writes and 2 reads to stack memory.

Now, if the body of the function P had a loop, then the call-preserved register using version could have a lower dynamic read & write count than the alternative that doesn't use call-preserved registers (probably: assuming the loop iterates more than once).