Why does memory-barrier prohibit optimization on static global variable?

229 Views Asked by At

The following code contains an interrupt-service-routine and a normal function func(), which uses the global flag and additionally the static global g. Without any memory-barrier this code is faulty, since flag is modified asynchronously.

Introducing the global memory-barrier <1> fixes that, but also inhibits the optimizations on g. My expectation was that all access to g would be optimized out, since g is not accessible outside this TU.

I know that a global memory barried has the same effect as calling a non-inline function f() <3>. But here is the same question: since g is not visible outside this TU, why should not optimize the access to g.

I tried to use a specific memory-barrier against flag but that did not help either.

(I avoided qualifying flag as volatile: this would help here, but it should only be used accessing HW-registers).

The question now is how to get the accesses to g optimized?

Compiler: avr-gcc

https://godbolt.org/z/ob6YoKx5a

#include <stdint.h>

uint8_t flag;  

void isr() __asm__("__vector_5") __attribute__ ((__signal__, __used__, __externally_visible__)); 
void isr() {
    flag = 1;
}

static uint8_t g;

void f();

void func(void) {
  for (uint8_t i=0; i<20; i++) {
//      f(); // <3>
//      __asm__ __volatile__ ("" : : : "memory"); // <1>
//          __asm__ __volatile__ ("" : "=m" (flag)); // <2>
    ++g;
    if (flag) {
      flag = 0;
    }
  }
}

//void f(){}
3

There are 3 best solutions below

2
Lundin On

Lots of misconceptions.

  • There is nothing called "static global", that's like saying "dog cat", they are each other's opposites. You can have variables declared at local scope or at file scope. You can have variables with internal linkage or external linkage.

    A "global" - which isn't a formal term, is a variable declared at file scope with external linkage which may be referred to by other parts of the program using extern. This is almost always bad practice and bad design.

    static ensures that a variable no matter where it was declared has internal linkage. So it is by definition not "global". Variables declared static are only accessible inside the scope where they were declared. For details check out What does the static keyword do in C?

  • Memory barriers is not a concept that makes much sense outside multicore systems. The purpose of memory barriers are to prevent concurrent execution/pipelining, pre-fetching or instruction reordering in multicore systems. Also, memory barriers do not guarantee re-entrancy on the software level.

    This is some single core AVR, one of the simplest CPUs still manufactured, so memory barriers is not an applicable concept. Do not read articles about PC programming on 64 bit x86 and try to apply them to 8-bit legacy architectures from the 1990s. Wrong tool, wrong purpose, wrong system.

  • volatile does not necessarily act as a memory barrier even on systems where that concept is applicable. See for example https://stackoverflow.com/a/58697222/584518.

    volatile does not make code re-entrant/thread-safe/interrupt-safe on any system, including AVR.

    The purpose of volatile in this context is to protect against incorrect compiler optimizations when the compiler doesn't realize that an ISR is called by hardware and not by the program. We can't volatile qualify functions or code in C, only objects, hence variables shared with an ISR need to be volatile qualified. Details and examples here: https://electronics.stackexchange.com/questions/409545/using-volatile-in-embedded-c-development/409570#409570

As for what you should be doing instead, I believe my answer to your other question here covers that.

15
emacs drives me nuts On

__asm__ __volatile__ ("" :: "m" (flag):"memory"); // <2>

You are clobbering all of memory due to "memory".

If you want to express that only flag is changed (and that the change has volatile effect(s)), then:

__asm__ __volatile__ ("" : "+m" (flag));

This tells GCC that flag is changed, not just an input to the asm like in <2>.

8
Peter Cordes On

@Lundin is correct that if (flag) flag = 0; isn't an atomic RMW, and wouldn't be even with volatile (still just separate atomic-load and atomic-store; an interrupt could happen between them.) See their answer for more about that, and that this seems like fundamentally wrong approach for some goals. Also, avoiding volatile only makes sense if you're replacing it with _Atomic; that's what Herb Sutter's 2009 article you linked was saying. Not that you should use plain variables and force memory access via barriers; that is fraught with peril, as compilers can invent loads or invent stores to non-atomic variables, and other less-obvious pitfalls. If you're going to roll your own atomics with inline asm, you need volatile; GCC and clang support that usage of volatile since it's what the Linux kernel does, as well as pre-C11 code.


Optimizing away g

The barrier isn't what blocks GCC from optimizing away g entirely. GCC misses this valid optimization when there's a function as simple as void func(){g++;} with no other code in the file, except the declarations.

But g is be optimized away even with asm("" ::: "memory") if the C code that uses it won't produce a chain of different values across calls. Storing a constant is fine, and storing a constant after an increment is enough to make the increment a dead store. Perhaps GCC's heuristic for optimizing it away only considers a chain of a couple calls, and doesn't try to prove that nothing value-dependent happens?

#include <stdint.h>

//uint8_t flag;
static uint8_t g;

void func(void) {
  __asm__ __volatile__ ("" ::: "memory"); // <1>
    
    int tmp = g;
    g = tmp;
    ++g; g = 1;
    // g = tmp+1;  // without a later constant store to make it dead will make GCC miss the optimization

  __asm__ __volatile__ ("" ::: "memory");  // <1>

}

GCC12 -O3 output for AVR, on Godbolt:

func:
        ret

The "memory" clobber forces the compiler to assume that g's value could have changed, if it doesn't optimize it away. But it doesn't make all static variables in the compilation implicit inputs/outputs. The asm statement is implicitly volatile because it has no output operands.

Telling the compiler that only flag was read+written should be equivalent. Except if g doesn't get optimized away, GCC can hoist the load of g out of the loop, only storing incremented values. (It misses the optimization of sinking the store out of the loop. That would be legal; the "+m"(flag) operand tells the compiler that flag was read + written so could have any value now, but without a "memory" clobber, the compiler can assume the asm statement didn't read or write any other state of the C abstract machine from registers or memory.

Your statement with an "=m" (flag) output-only operand is different: it tells the compiler that the old value of flag was irrelevant, not an input. So if it was unrolling the loop, any stores to flag before one of those asm statements would be a dead store.

(The asm statement is volatile so it does have to run it as many times as its reached in the abstract machine; it has to assume there might be side-effects like I/O to something that isn't a C variable. So the previous asm statements can't be removed as dead because of that, but only because of volatile.)