Performance impact of choice of registers in x86_64

70 Views Asked by At

I'm wondering, is there any comprehensive guidelines for how one should chose which registers to use, when generating custom x86_64 assembly code?

I know the basics about pipelining, and the implications of data-dependencies (including false dependencies when using smaller register without sign extension), but there are still details I'm unsure about. For example, when reading a chain of memory operands (ie. a set of nested pointers), is it better to reuse the same register, or different ones?

mov         rcx,qword ptr [rbx]
mov         rcx,qword ptr [rcx]
mov         rcx,qword ptr [rcx]

For reference, I'm writing a native compiler for a stack-based scripting language. At this point, the language still has to use it's own stack-structure for certain operations, so that following code would be to put the stack-top pointer from our internal "register" struct, into "rcx".

My initial instinct would have been to generate the code as seen above, to minimize the amount of registers that have to be utilized during this operation. However, IIRC correctly I noticed a stark deterioration in performance, compared to

mov         rcx,qword ptr [rbx]
mov         rdx,qword ptr [rcx]
mov         rax,qword ptr [rdx]

My best guess is that this might be better to pipeline, though I'm unsure how, as each instruction would depend on the result of the read memory from the previous one. And obviously, this complicates the process of writing the code-generator, as now many operations would need to potentially use multiple registers, especially when trying to setup the arguments for a function-call.

So I'd like to make sure to actually understand the "best practices" for performant register usage, for my case (and also in general). Can somebody clear up my confusion? Maybe there is some source that put this information together?

0

There are 0 best solutions below