I would like to know whether strace can cause anomaly for the program it is tracing.
Currently, I am trying to trace a random segmentation fault error (but it seems like the program never crashes that way when I use strace) which is caused in a line where I call pthread_cond_wait()
.
When I directly run my program - which is actually a mix of c/c++, it sometimes works as it is supposed to be, but as mentioned before, sometimes it crashes at pthread_cond_wait()
(by the way, if anyone wants to help me with that problem, see here, any help would be much appreciated).
If I directly run my program and attach strace to the process like this:
strace -ttTD -o strace_today.txt -p PROCESS_ID
The output is a one-liner where it says that it is waiting for a futex (effecively like this:)
futex(x,FUTEX_WAIT_PRIVATE,x)
If I run my program from strace like this:
strace -ttTD -o strace_today.txt example_program
Then at some point of my file output, to be precise, when I call pthread_cond_wait()
, it keeps spamming with multiple lines like these (and every time, the value the futex()
call is waiting for is higher than before, here it is 15)
12:46:15.636366 semop(11599962, {{0, -1, 0}}, 1) = 0 <0.000031>
12:46:15.636512 futex(0x8053838, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000033>
12:46:15.636637 futex(0x8053864, FUTEX_WAIT_PRIVATE, 15, NULL) = ? ERESTARTSYS (To be restarted) <0.002034>
12:46:15.638832 futex(0x8053864, FUTEX_WAIT_PRIVATE, 15, NULL) = 0 <0.001449>
12:46:15.640436 clone(child_stack=0xb6cd0484, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0xb6cd0bd8, {entry_number:6, base_addr:0xb6cd0b70, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}, child_tidptr=0xb6cd0bd8) = 25403 <0.000045>
12:46:15.640598 semop(11599962, {{0, -1, 0}}, 1) = 0 <0.000015>
I also tried to run strace as a child and not parent of the process (in the hope it would make a difference). And even though I tried to catch that random segmentation fault error, it never appeared/happened.
Now my question is whether this is common and on purpose or whether my strace-call is bogus. And if not, are there any syscalls I need to be aware of, for they might not work with strace, or does this strange behavior concern a group of syscalls? Is there any way around this?
I am using debian-squeeze, if that might be relevant.
Update 1
I totally forgot to mention that I am running multiple threads (POSIX threads) and a few childs. Though the pthread_cond_wait()
should not encounter any race, since it is definitely the first call after a pthread_mutex_lock()
which accesses the pthread_cond_t
and pthread_mutex_t
which I am parsing as arguments. But I do not know whether inside the pthread_cond_wait()
might be any race conditions. I will provide programcode if neccessary.
The most likely cause for problems like this is the fact that strace can influence the timing of your application which may expose locking bugs.