I have a very strange crash on ARM linux platform caused by simple code. The problem is that it reproduces rarely (once a day) and another problem is that it crashes where it actually cannot.
Let's start from C++ code. Thread function does this:
event_obj events[EVENTS_MAX]; // EVENTS_MAX = 32
int num = 0;
m_engine->getEvents(events, &num);
engine
is pointer to base abstract class which has only one implementation at the moment. getEvents is pure virtual method.
getEvents
after some changes does nothing but this
int engine::getEvents(event_obj*, int* num)
{
if (num != nullptr)
{
*num = 0; // SEGMENTATION FAULT
}
return 1; // ok
}
SEGFAULT happens when trying to store 0 in num. First I thought it is stack corruption, but after I checked generated assembler code it seems that nothing is stored in stack here. This method doesn't even have stack protection generated (-fstack-protector-strong is enabled), both parameters are stored in registers r1 and r2. Let's see the code for function call:
event_obj events[EVENTS_MAX];
int num = 0;
236f8: 2300 movs r3, #0
236fa: ac06 add r4, sp, #24
236fc: 9306 str r3, [sp, #24]
m_engine->getEvents(events, &num);
236fe: 6803 ldr r3, [r0, #0]
23700: 691b ldr r3, [r3, #16]
23702: 4622 mov r2, r4
23704: a90c add r1, sp, #48 ; 0x30
23706: 4798 blx r3
and the code for the function itself:
int engine::getEvents(event_obj*, int* num)
{
if (num != nullptr)
251f8: 4613 mov r3, r2
251fa: b10a cbz r2, 25200 <_Z18engine_thread_funcPv+0x9e0>
{
*num = 0;
251fc: 2200 movs r2, #0
251fe: 601a str r2, [r3, #0]
}
return 1; // ok
}
25200: 2001 movs r0, #1
25202: 4770 bx lr
return 1; // ok
}
as you can see from the generated code, pointers are put int r1
and r2
registers.
23702: 4622 mov r2, r4
23704: a90c add r1, sp, #48 ; 0x30
Even if stack is corrupted, it may corrupt value for num
variable, but how can it corrupt pointer in register? Also from crash log I can see that LR
address is wrong.
CRASH signal 11 Segmentation fault address 0xf0000000 PC 0x251fe LR 0x6c3c533c
The only thing I cannot see from here is the address of jump (blx r3), because called method is virtual. I have one very unlikely assumption that instead of jumping to the first line of virtual method body, it jumped to few lines prior to that and corrupted registers, but I don't get how is it possible. Also it crashes always at the same line, even after changing the code. That is very strange.
Can someone suggest something to try? Any ideas?
Thanks in advance.
The fault occurs because engine is no longer valid. The method containing engine probably been deallocated - ie, your thread memory is gone. As such, engine-getevents is not even valid in memory. Something happened somewhere else in your code and the threads should have stopped running - and exited. They havent. This is much like a callback into an application that is exiting.