I know this question was submitted many times, but I couldn't find any clear answer yet, so I'm sorry, but I still have the doubt.
When programming in assembly language, for personal issues, not necessarily linked with external libraries, but wanting to comply with written or non-written conventions, what registers should I use and in what order?
I thought in using the same registers used by function arguments (RDI, RSI, RDX, RCX, R8, R9) to store values if I need them. Also, if I need more, I begin with R10-R15, but I'm not sure if this is a customary practice.
Also, what's the criteria to choose the stack before registers?
PS. Sorry again for asking a question that was submitted to the site a few times now, but I just want to do it "right". Thanks.
In any calling convention, use the call-clobbered registers before call-preserved regs, except for values that you want to survive across a function call. My answer on that linked Q&A covers a lot of what you're asking about how / when to use registers.
For x86-64 on OSes other than Windows, see What registers are preserved through a linux x86-64 function call for calling convention details like which registers are call-clobbered. (Not R12-R15).
Within the call-clobbered regs on x86-64, they're all equivalent for most purposes, although many instructions have smaller encodings for AL or EAX with an 8-bit or 32-bit immediate respectively. (And smaller machine-code size is generally better, all else equal, for better I-cache density and front-end decode throughput.)
Note that
add eax, 4is shortest using the standard 3-byteadd r/m32, imm8encoding, not 5-byteadd eax, imm32. AL for 8-bit immediate operations always saves space, but EAX for 32 or 64-bit operations only saves space for constants that don't fit in an imm8. Or fortest eax, immediatebecause there's notest r/m, imm8encoding. But of course you can always usetest al, 1instead oftest eax, 1if you want the low bit; the only thing you lose out on is things liketest eax, -128to check all but the low 7 bits with sign-extension of a negative number.See also Why are RBP and RSP called general-purpose registers? for details on extra code size for some addressing modes involving some registers (involving RBP, R12, and R13 as the base).
That RBP/RSP Q&A also mentions the fact that most of the "legacy" registers (not R8-R15) have some special instruction that uses them implicitly. Like RCX (specifically CL) being the only register for variable-count shift counts, like
shr edx, cl, unless you have BMI2shrx edx, eax, esi(which is large code-size but more efficient on Intel, being single uop).Another case where all else is not equal: Which Intel microarchitecture introduced the ADC reg,0 single-uop special case? - even on Skylake, the
adc al, 0short-form encoding is 2 uops, for no apparent reason, only fixed in Alder Lake.