Is there anyway to force a C function on clang to be optimized even when the file is compiled with -O0?

I'm looking for something equivalent to gcc's __attribute((optimize("s")) or __attribute((optimize(3)).

(Related: In clang, how do you use per-function optimization attributes?)


What I'm trying to do is generate certain functions in almost pure assembly via a macro—the remaining C code in there shouldn't generate any assembly code. Ideally, the macro would use C-based integer constant expressions to choose which code to paste and writing static before it would make the generate function static. I also want no stack manipulation in the function's prologue.

On GCC something like:

enum { CONSTANT = 0 };
__attribute((optimize("Os"),noinline,noipa))
int foo(void){
    if (CONSTANT) asm("mov $1, %eax; ret;");
    else asm("xor %eax, %eax; ret;");
    __builtin_unreachable();
}

gets the gist of it successfully. On clang, the optimize attribute is unrecognized and a push %rbp; mov %rsp, %rbp prologue is generated which would break my real use case, as well as the ret in this toy example, so it's most undesirable.

On GCC, __attribute((naked)) also works to eliminate the prologue and disable inlining and Inter-Procedural Analysis (IPA), but clang hard-rejects it, enforcing the requirement that naked functions should only consist of pure assembly (no nongenerating C code, even).

Per the GCC docs for x86 function attributes:

naked

This attribute allows the compiler to construct the requisite function declaration, while allowing the body of the function to be assembly code. The specified function will not have prologue/epilogue sequences generated by the compiler. Only basic asm statements can safely be included in naked functions (see Basic Asm). While using extended asm or a mixture of basic asm and C code may appear to work, they cannot be depended upon to work reliably and are not supported.

While not supported, it was working well enough for my use-case. The hack with __attribute__((optimize("Os"),noinline,noipa)) is even more hacky but does in fact compile to the asm I want with current GCC. I'd like to do something similar with clang.

4

There are 4 best solutions below

3
Jester On BEST ANSWER

How about you put the selector and the alternatives into three separate functions, with the latter two marked with __attribute((naked)) that you say works? Something like this:

enum { CONSTANT = 0 };
__attribute((naked))
int foo1(void){
    asm("mov $1, %eax; ret;");
}
__attribute((naked))
int foo0(void){
    asm("xor %eax, %eax; ret;");
}
int foo(void){
    if (CONSTANT) return foo1();
    else return foo0();
}
0
nielsen On

At least one attempt to implement this in clang was abandoned.

I think the only way is to put the function in a file by itself and compile that file with the optimization you want.

Building the function with -O2 does get rid of the prologue (see here).

4
Peter Cordes On

Jester's answer is probably good for simple-enough cases if you can manually create every combination of asm blocks you might need. If they're only used in one compilation unit, they can be static to let the unused ones optimize away.

But you do want the non-inline version to be visible for inlining, so you don't get an extra jmp tailcall on every call, so all the callers have to be in the same compilation unit.

If that's not viable, link-time optimization should let the unused versions optimize away and not bloat your binary.


If you have many different branches that would lead to too large a combinatorial explosion of possibilities to maintain, you should definitely consider adding a step to your build system to get these constants as CPP macros so you can do this with #if or #ifdef around multiple asm(""); statements in a naked function in a well-defined way.

What you're doing now with non-naked functions is a horrible abuse of the compiler that's not at all supported, merely happens to work.

if(constant) inside a naked function is also not officially supported, but seems to me like something that's less likely to break, as long as the constants are truly compile-time constant expressions. Still, no guarantees, unlike if you use the C preprocessor to just paste text together.

0
Petr Skocik On

Here's what I think is probably my most flexible solution to this so far with zero extra generated code in optimized builds:

  1. use a non-naked func with inline assembly blocks intermixed with C
  2. don't try to avoid prologues but undo them if they're generated

Step two, encapsulated by a MAYBE_DELETE_FRAME() macro, which is to be used at the very beginning of such a pseudo-naked function, assumes that:

  1. any possible prologue is a frame setup (can be undone by the "leave" instruction)
  2. no frames are set up in optimized builds*
  3. a macro is defined by the build to distinguish nonoptimized and optimized builds

(*the default for optimized codegen on x86-64 SysV ABI unless VLAs/allocas or inline assembly with rsp clobbers are used)

#if NO_OPTIMIZATION /*build system should set it IFF -O0*/
    #define MAYBE_DELETE_FRAME_FOR(FUNC_NAME) __asm("leave;")
#else
    #define MAYBE_DELETE_FRAME_FOR(FUNC_NAME) /**/
#endif

A version of the macro could be defined regardless of optimization config by measuring the distance from a function start to the first user-issued assembly instruction. If it is find to be nonzero and then it's statically asserted that it is 4 (only push %rbp; mov %rsp, %rbp; prologues are expected) and leave is generated, otherwise nothing is generated:

#define MAYBE_DELETE_FRAME_FOR(FUNC_NAME) __asm(\
        ".if .-" #FUNC_NAME "\n" \
            ".if .-" #FUNC_NAME "!= 4\n" \
                ".err\n" \
            ".endif\n" \
            "leave\n" \
        ".endif\n" \
    )

Unfortunately, this more foolproof version of the macro again fails on clang, due to clang not considering the .-FUNC_NAME label subtraction to be an absolute expression (Interestingly, it does consider it to be an absolute expression in an equivalent *.s file. I think this discrepancy is a clang bug: https://github.com/llvm/llvm-project/issues/62520).