Process stacks and interrupts on Cortex-M ARM cores

743 Views Asked by At

According to ARMv7-M and ARMv8-M reference manuals, exception stack frame is formed on currently active stack (MSP or PSP, depending on what was interrupted by the exception).

This decision looks unlogical to me: every process stack has to have a space for exception stack frame; it could be huge, especially when FPU and security extensions are used. But more importantly, it leaves at least one unanswered question: how to isolate process stack overflows from the rest of a system?

Suppose you have ARMv8-M platform (i.e. Cortex-M33) that runs unprivileged process with MPU restrictions enforced. Process has just a single MPU region for stack, and also PSPLIM register is set. Process runs near it's stack limit and the stack space is insufficient to hold exception frame.

Now some peripheral interrupt arrives. Most likely you will get an UsageFault with STKOF flag set. This is where problems start. First, you missed the exception. Most likely it is still pending and you will get it again. But how to recover?

UsageFault handling will be subject to same stack limits. There is still no space for exception frame. HardFault can ignore stack limits, but this does not make situation any better. Ignored stack limit means that memory beyond the stack is now corrupted. You could probably reserve some space after PSPLIM exactly for the HardFault, and at least you won't get corrupted memory.

Is there a safe way to deal with such situation? System should remain consistent and operational regardless of bugs (or malicious behavior) of unprivileged process.

2

There are 2 best solutions below

0
On BEST ANSWER

TL;DR

Stack frame is not written. You lose the context of currently executing task. Inaccessible memory is not corrupted. UsageFault (for stack limit) or MemManage (for MPU violations) is taken instead of original exception. This behavior is well-documented in ARM reference manual. Invalid stack frame is signalled with MMFSR.MSTKERR or UFSR.STKOF bits depending on the exception.

Test program

// Configuration defines:
// #define TESTCASE 0                // 0, 1 or 2
// #define ENABLE_STACK_LIMIT        // Enables SPLIM registers
// #define ENABLE_MPU                // Enables MPU in unprivileged mode

#include <stm32u5xx.h>
#include <cstring>

enum {
    MSP = 0x20001000,   // Main stack pointer
    MSS = 32,           // Main stack size
    PSP = 0x20000F80,   // Process stack pointer
    PSS = 32,           // Process stack size
    XSO = 0x800,        // Offset of stack area from MSP
    XSS = 0x1000,       // Total size of stack area

    FLASH_START     = 0x08000000,
    FLASH_END       = 0x08010000
};

#define EXCEPTION_STUB(func)                                                                            \
    extern "C" [[gnu::naked]] void func() {                                                             \
        __asm volatile (                                                                                \
            "ldr r0, =$0xDDCCBBAA\n"                                                                    \
            "push {r0}\n"               /* Push marker value to stack to see it in the debugger */      \
            "add sp, 4\n"               /* Restore stack pointer after push */                          \
            "bkpt\n"                                                                                    \
            "bx lr\n"                                                                                   \
            ::: "r0", "memory"                                                                          \
        );                                                                                              \
    }

EXCEPTION_STUB(HardFault_Handler)
EXCEPTION_STUB(BusFault_Handler)
EXCEPTION_STUB(MemManage_Handler)
EXCEPTION_STUB(UsageFault_Handler)
EXCEPTION_STUB(SVC_Handler)

int main() {
    memset((void *) (MSP - XSS), 0x00, XSS + XSO);
    memset((void *) (MSP - MSS), 0x55, MSS);
    memset((void *) (PSP - PSS), 0xAA, PSS);

    SCB->SHCSR = SCB_SHCSR_USGFAULTENA_Msk | SCB_SHCSR_MEMFAULTENA_Msk | SCB_SHCSR_BUSFAULTENA_Msk;

#if defined(ENABLE_MPU)
    /* Regions must be 32-byte aligned to meet MPU requirements */
    static_assert(((PSP - PSS) & 0x1F) == 0);
    static_assert((PSP & 0x1F) == 0);
    static_assert((FLASH_START & 0x1F) == 0);
    static_assert((FLASH_END & 0x1F) == 0);

    /* Region 0: stack, RW, execute-never */
    MPU->RNR = 0;
    MPU->RBAR = (PSP - PSS) | (0b10 << MPU_RBAR_SH_Pos) | (0b01 << MPU_RBAR_AP_Pos) | MPU_RBAR_XN_Msk;
    MPU->RLAR = ((PSP - 1) & MPU_RLAR_LIMIT_Msk) | MPU_RLAR_EN_Msk;

    /* Region 1: flash, RO, executable */
    MPU->RNR = 1;
    MPU->RBAR = FLASH_START | (0b10 << MPU_RBAR_SH_Pos) | (0b11 << MPU_RBAR_AP_Pos);
    MPU->RLAR = ((FLASH_END - 1) & MPU_RLAR_LIMIT_Msk) | MPU_RLAR_EN_Msk;

    MPU->MAIR0 = 0b01000100;    // Normal memory, non-cacheable
    MPU->CTRL = MPU_CTRL_ENABLE_Msk | MPU_CTRL_PRIVDEFENA_Msk;
#endif

    __set_MSP(MSP);
    __set_PSP(PSP);

#if defined(ENABLE_STACK_LIMIT)
    __set_MSPLIM(MSP - MSS);
    __set_PSPLIM(PSP - PSS);
#endif

    __set_CONTROL(__get_CONTROL() | CONTROL_SPSEL_Msk | CONTROL_nPRIV_Msk);
    __ISB();

#if TESTCASE == 0
    /* Stack pointer stays valid in this test case */
    /* Decrement it so stack frame (32 bytes) won't fit */
    __asm volatile ("sub sp, 4");
#elif TESTCASE == 1
    /* Stack pointer is manually adjusted to cause stack overflow */
    __asm volatile (
        "ldr r0, =$0x20000F00\n"
        "mov sp, r0\n"
        "isb\n"
        ::: "r0", "memory"
    );
#elif TESTCASE == 2
    /* Stack pointer is corrupted upwards and placed above the original stack */
    __asm volatile (
        "ldr r0, =$0x20000FA0\n"
        "mov sp, r0\n"
        "isb\n"
        ::: "r0", "memory"
    );
#endif

    __asm volatile (
        "ldr r0, =$0x44332211\n"    /* Put markers in the registers to make stack frame more visible in memory view */
        "ldr r1, =$0x88776655\n"
        "bkpt\n"                    /* Last chance to inspect state of the core */
        "svc 123\n"                 /* Trigger exception */
        "bkpt\n"                    /* Halt again if SVC has returned */
        ::: "r0", "memory"
    );

    return 0;
}

Implemented test cases:

  1. Simple stack overflow: SPLIM is sufficient to catch this
  2. SP is adjusted below current stack: SPLIM is sufficient to catch this. Exception is raised when SP is written (this is documented behavior too), memory access is not required.
  3. SP is adjusted above current stack. MPU is required to catch this.

SPLIM is mostly redundant when MPU is active, but it may be useful when another MPU region is directly adjacent to stack region and MemManage is not generated.

Both thread ("regular") stack overflow and context stacking failure set UFSR.STKOF. From handler point of view, exact stack overflow reason is not important: task context is lost anyway.

References

Observed behavior is documented in the following parts of ARMv8 architecture reference manual:

  1. B3.18 Exception handling

    RWBND: Preemption of current execution causes the following basic sequence:

    • R0-R3, R12, LR, RETPSR, including CONTROL.SFPA, are stacked.
    • The return address is determined and stacked.
    • <...>
    • The exception to be taken is chosen, and IPSR.Exception is set accordingly. The setting of IPSR.Exception to a nonzero value causes the PE to change to Handler mode.

    This implies that context stacking happens while PE is still in Thread mode with all security restrictions still active.

  2. B3.19 Exception entry, context stacking

    RVNSK: If one or more of the following exceptions is generated during the stacking operations on exception entry the PE is permitted to abandon any remaining stacking operations:

    • MemManage fault
    • STKOF UsageFault

    IFKBH: If a MemManage fault, BusFault, or AUVIOL SecureFault occurs on a stacking memory access during exception entry, then stacking of Additional state context is optional.

  3. B3.21 Stack limit checks

    RZLZG: On a violation of a stack limit during either exception entry or tail-chaining:

    • In a PE with the Main Extension, a synchronous STKOF UsageFault is generated. Otherwise, a HardFault is generated.
    • The stack pointer is set to the stack limit value.
    • Push operations to addresses below the stack limit value are not performed.

    IBJHX: When an instruction updates the stack pointer, if it results in a violation of the stack limit, it is the modification of the stack pointer that generates the exception, rather than an access that uses the out-of-range stack pointer.

  4. B3.24 Exceptions during exception entry

    ILBGQ: During exception entry exceptions can occur <...>, for example a MemManage fault on the push to the stack.
    <...>
    When the exception entry sequence itself causes an exception, the latter exception is a derived exception.

    RMRTR: For Derived exceptions, late-arrival preemption is mandatory.

4
On

The configuration you suggest, unprivileged code with the MPU active and running on the Main stack, requires careful allocation of stack space. The main stack must have enough space to support exceptions from the fixed priority exceptions (NMI, HardFault) and any other nesting of system exceptions and interrupts. Depending upon how system exception and interrupt priorities are assigned, this can add up to substantial space.

The situation is more predictable if Handler Mode processing is placed on the Main stack and the unprivileged, Thread mode code uses the Process stack. For that case only one level of exception stack frame is needed because once an exception happens, say an interrupt, any other exceptions of higher priority use the Main stack. This configuration is easier to understand and setup the stack usage.

I usually assign all the system exceptions the same priority, which is higher than interrupts, which are higher than either SVC or PendSV. Then the Main stack must have space for 3 exception frames plus however many levels of nested interrupts (I usual only use 1, so no interrupt nesting) plus the stack usage by the handlers (which the compiler will estimate). That leaves the Process stack to run the unprivileged code (again the compiler will help) plus one exception frame.

I'm not sure what form of recovery from system exceptions you require, but I treat them all as unrecoverable and just do the best I can to save state that can be examined after a reset.