While I was working on this question, I've come across a possible idea that uses ptrace, but I'm unable to get a proper understanding of how ptrace interacts with threads.
Suppose I have a given, multithreaded main process, and I want to attach to a specific thread in it (perhaps from a forked child).
Can I attach to a specific thread? (The manuals diverge on this question.)
If so, does that mean that single-stepping only steps through that one thread's instructions? Does it stop all the process's threads?
If so, do all the other threads remain stopped while I call
PTRACE_SYSCALLorPTRACE_SINGLESTEP, or do all threads continue? Is there a way to step forward only in one single thread but guarantee that the other threads remain stopped?
Basically, I want to synchronise the original program by forcing all threads to stop, and then only execute a small set of single-threaded instructions by single-stepping the one traced thread.
My personal attempts so far look a bit like this:
pid_t target = syscall(SYS_gettid); // get the calling thread's ID
pid_t pid = fork();
if (pid > 0)
{
waitpid(pid, NULL, 0); // synchronise main process
important_instruction();
}
else if (pid == 0)
{
ptrace(target, PTRACE_ATTACH, NULL, NULL); // does this work?
// cancel parent's "waitpid" call, e.g. with a signal
// single-step to execute "important_instruction()" above
ptrace(target, PTRACE_DETACH, NULL, NULL); // parent's threads resume?
_Exit(0);
}
However, I'm not sure, and can't find suitable references, that this is concurrently-correct and that important_instruction() is guaranteed to be executed only when all other threads are stopped. I also understand that there may be race conditions when the parent receives signals from elsewhere, and I heard that I should use PTRACE_SEIZE instead, but that doesn't seem to exist everywhere.
Any clarification or references would be greatly appreciated!
I wrote a second test case. I had to add a separate answer, since it was too long to fit into the first one with example output included.
First, here is
tracer.c:tracer.cexecutes the specified command, waiting for the command to receive aSIGSTOPsignal. (tracer.cdoes not send it itself; you can either have the tracee stop itself, or send the signal externally.)When the command has stopped,
tracer.cattaches a ptrace to every thread, and single-steps one of the threads a fixed number of steps (SINGLESTEPScompile-time constant), showing the pertinent register state for each thread.After that, it detaches from the command, and sends it a
SIGCONTsignal to let it continue its operation normally.Here is a simple test program,
worker.c, I used for testing:Compile both using e.g.
and run either in a separate terminal, or on the background, using e.g.
The tracer shows the PID of the worker:
At this point, the child is running normally. The action starts when you send a
SIGSTOPto the child. The tracer detects it, does the desired tracing, then detaches and lets the child continue normally:You can repeat the above as many times as you wish. Note that I picked the
SIGSTOPsignal as the trigger, because this waytracer.cis also useful as a basis for generating complex multithreaded core dumps per request (as the multithreaded process can simply trigger it by sending itself aSIGSTOP).The disassembly of the
worker()function the threads are all spinning in the above example:Now, this test program does only show how to stop a process, attach to all of its threads, single-step one of the threads a desired number of instructions, then letting all the threads continue normally; it does not yet prove that the same applies for letting specific threads continue normally (via
PTRACE_CONT). However, the detail I describe below indicates, to me, that the same approach should work fine forPTRACE_CONT.The main problem or surprise I encountered while writing the above test programs was the necessity of the
loop, especially for the
ESRCHcase (the others I only added due to the ptrace man page description).You see, most ptrace commands are only allowed when the task is stopped. However, the task is not stopped when it is still completing e.g. a single-step command. Thus, using the above loop -- perhaps adding a millisecond nanosleep or similar to avoid wasting CPU -- makes sure the previous ptrace command has completed (and thus the task stopped) before we try to supply the new one.
Kerrek SB, I do believe at least some of the troubles you've had with your test programs are due to this issue? To me, personally, it was a kind of a D'oh! moment to realize that of course this is necessary, as ptracing is inherently asynchronous, not synchronous.
(This asynchronicity is also the cause for the
SIGCONT-PTRACE_CONTinteraction I mentioned above. I do believe with proper handling using the loop shown above, that interaction is no longer a problem -- and is actually quite understandable.)Adding to the comments to this answer:
The Linux kernel uses a set of task state flags in the task_struct structure (see
include/linux/sched.hfor definition) to keep track of the state of each task. The userspace-facing side ofptrace()is defined inkernel/ptrace.c.When
PTRACE_SINGLESTEPorPTRACE_CONTis called,kernel/ptrace.c:ptrace_continue()handles most of the details. It finishes by callingwake_up_state(child, __TASK_TRACED)(kernel/sched/core.c::try_to_wake_up(child, __TASK_TRACED, 0)).When a process is stopped via
SIGSTOPsignal, all tasks will be stopped, and end up in the "stopped, not traced" state.Attaching to every task (via PTRACE_ATTACH or PTRACE_SEIZE, see
kernel/ptrace.c:ptrace_attach()) modifies the task state. However, ptrace state bits (seeinclude/linux/ptrace.h:PT_constants) are separate from the task runnable state bits (seeinclude/linux/sched.h:TASK_constants).After attaching to the tasks, and sending the process a
SIGCONTsignal, the stopped state is not immediately modified (I believe), since the task is also being traced. Doing PTRACE_SINGLESTEP or PTRACE_CONT ends up inkernel/sched/core.c::try_to_wake_up(child, __TASK_TRACED, 0), which updates the task state, and moves the task to the run queue.Now, the complicated part that I haven't yet found the code path, is how the task state gets updated in the kernel when the task is next scheduled. My tests indicate that with single-stepping (which is yet another task state flag), only the task state gets updated, with the single-step flag cleared. It seems that PTRACE_CONT is not as reliable; I believe it is because the single-step flag "forces" that task state change. Perhaps there is a "race condition" wrt. the continue signal delivery and state change?
(Further edit: the kernel developers definitely expect
wait()to be called, see for example this thread.)In other words, after noticing that the process has stopped (note that you can use
/proc/PID/stator/proc/PID/statusif the process is not a child, and not yet attached to), I believe the following procedure is the most robust one:After the above, all tasks should be attached and in the expected state, so that e.g. PTRACE_CONT works without further tricks.
If the behaviour changes in future kernels -- I do believe the interaction between the STOP/CONT signals and ptracing is something that might change; at least a question to the LKML developers about this behaviour would be warranted! --, the above procedure will still work robustly. (Erring on the side of caution, by using a loop to PTRACE_SINGLESTEP a few times, might also be a good idea.)
The difference to PTRACE_CONT is that if the behaviour changes in the future, the initial PTRACE_CONT might actually continue the process, causing the
ptrace()that follow it to fail. With PTRACE_SINGLESTEP, the process will stop, allowing furtherptrace()calls to succeed.Questions?