SEGFAULT on writing to stack variable

332 Views Asked by At

I have a very strange crash on ARM linux platform caused by simple code. The problem is that it reproduces rarely (once a day) and another problem is that it crashes where it actually cannot.

Let's start from C++ code. Thread function does this:

    event_obj events[EVENTS_MAX]; // EVENTS_MAX = 32
    int num = 0;
    m_engine->getEvents(events, &num);

engine is pointer to base abstract class which has only one implementation at the moment. getEvents is pure virtual method.

getEvents after some changes does nothing but this

int engine::getEvents(event_obj*, int* num)
{
    if (num != nullptr)
    {
        *num = 0; // SEGMENTATION FAULT
    }
    return 1; // ok
}

SEGFAULT happens when trying to store 0 in num. First I thought it is stack corruption, but after I checked generated assembler code it seems that nothing is stored in stack here. This method doesn't even have stack protection generated (-fstack-protector-strong is enabled), both parameters are stored in registers r1 and r2. Let's see the code for function call:

        event_obj events[EVENTS_MAX];
        int num = 0;
   236f8:       2300            movs    r3, #0
   236fa:       ac06            add     r4, sp, #24
   236fc:       9306            str     r3, [sp, #24]
        m_engine->getEvents(events, &num);
   236fe:       6803            ldr     r3, [r0, #0]
   23700:       691b            ldr     r3, [r3, #16]
   23702:       4622            mov     r2, r4
   23704:       a90c            add     r1, sp, #48     ; 0x30
   23706:       4798            blx     r3

and the code for the function itself:

int engine::getEvents(event_obj*, int* num)
{
    if (num != nullptr)
   251f8:       4613            mov     r3, r2
   251fa:       b10a            cbz     r2, 25200 <_Z18engine_thread_funcPv+0x9e0>
    {
        *num = 0;
   251fc:       2200            movs    r2, #0
   251fe:       601a            str     r2, [r3, #0]
    }
    return 1; // ok
}
   25200:       2001            movs    r0, #1
   25202:       4770            bx      lr
    return 1; // ok
}

as you can see from the generated code, pointers are put int r1 and r2 registers.

   23702:       4622            mov     r2, r4
   23704:       a90c            add     r1, sp, #48     ; 0x30

Even if stack is corrupted, it may corrupt value for num variable, but how can it corrupt pointer in register? Also from crash log I can see that LR address is wrong.

CRASH signal 11 Segmentation fault address 0xf0000000 PC 0x251fe LR 0x6c3c533c

The only thing I cannot see from here is the address of jump (blx r3), because called method is virtual. I have one very unlikely assumption that instead of jumping to the first line of virtual method body, it jumped to few lines prior to that and corrupted registers, but I don't get how is it possible. Also it crashes always at the same line, even after changing the code. That is very strange.

Can someone suggest something to try? Any ideas?

Thanks in advance.

1

There are 1 best solutions below

2
On

The fault occurs because engine is no longer valid. The method containing engine probably been deallocated - ie, your thread memory is gone. As such, engine-getevents is not even valid in memory. Something happened somewhere else in your code and the threads should have stopped running - and exited. They havent. This is much like a callback into an application that is exiting.