Calling convention to use for max. portability between x86 systems

610 Views Asked by At

I am working on a set of self-contained x86 assembly routines that I would like to make available to C programs on systems below:

  • Linux 64-bit only
  • Windows 32-bit and 64-bit
  • (Good to have ultimately, Mac 64-bit, but this is not clear as Apple appear to be on their way to drop x86 in favour of ARM)

I use LLVM in some other capacity already and it is almost certain that I would use clang rather than gcc although I can envisage a situation of someone's wanting to compile the whole of it using gcc. The assembler will be NASM.

I develop both the routines and a C library that exposes them to users, i.e. everything is under my control and I can design everything as needed.

I expect that some users will actually use C++ but they will still link to the C library - that is, not with the assembly routines directly.

As I am new to assembly, I am in the process of discovering a wonderful maze of various calling conventions spread across systems, compilers, vendors, calling variants and languages. I cannot say that it does not make for interesting reads sometimes but I cannot say either that it is not confusing to beginners.

My take after reading up on it all is that at the end of the day I can simply start with cdecl for maximum portability in the initial version and then think about special casing to cover other conventions if needs arise - depending on what the routines actually do I may make things faster by using other conventions in specific cases.

But initially, as I would like to have something that works correctly and then optimise it even further - is it correct to say that settling on cdecl will offer maximum portability across the systems that I listed? Thank you.

1

There are 1 best solutions below

2
On BEST ANSWER

x86-64 Linux and MacOS both use the x86-64 System V ABI. Windows uses its own calling convention. None of these x86-64 platforms call it "cdecl".

The normal approach is for your library to uses the standard calling convention for the target platform, which means different asm for each one. One way to handle this is with asm macros to adapt the tops of your functions for different calling conventions. Or to parameterize register names like ARG1 instead of hard-coding RDI, but that gets very complicated very fast if your functions are more than trivial pointer increments, or if you ever use a register for something other than a function arg.

On 32-bit Window you have a choice of multiple conventions; fastcall / vectorcall are the two that suck the least. On every other x86 32 and 64-bit platform, there's one standard calling convention. It'll be easier for people to use your library if you follow it.

Agner Fog's calling convention guide has some more detailed suggestions for dealing with portability of hand-written asm. https://www.agner.org/optimize/


You could in theory use x86-64 System V everywhere, but then on Windows MSVC would be unable to emit calls to your code. (GNU C compatible compilers like gcc, clang, and ICC could use __attribute__((sysv_abi)) in the prototypes on Windows where their default calling convention is what MS names x64 fastcall).

I guess you could use x86-64 fastcall everywhere and use __attribute__((ms_abi)) in your prototypes for non-MSVC compilers. But that may cost some performance overhead, especially if you want to use all the XMM regs. (xmm6..15 are call-preserved in x64 fastcall). But beware of compiler bugs; using non-default calling conventions is not nearly as well tested.

If all your functions have 4 or fewer total register args, it's not too bad a calling convention in most respects. Otherwise more register args are usually more efficient. Why does Windows64 use a different calling convention from all other OSes on x86-64?


32-bit and 64-bit are obviously vastly different; none of the standard calling conventions are compatible between 32 and 64-bit code, and your code will usually need to be pretty different anyway.

The only real similarity is between 32-bit Windows fastcall and the standard 64-bit Windows calling convention (which MS also calls fastcall), but 32-bit fastcall only passes the first 2 args in regs, and is callee-pops stack args. 64-bit fastcall passes the first 4 args in regs, starting with the same 2 but then using r8 and r9 which only exist in 64-bit mode.