While I was working on this question, I've come across a possible idea that uses ptrace
, but I'm unable to get a proper understanding of how ptrace
interacts with threads.
Suppose I have a given, multithreaded main process, and I want to attach to a specific thread in it (perhaps from a forked child).
Can I attach to a specific thread? (The manuals diverge on this question.)
If so, does that mean that single-stepping only steps through that one thread's instructions? Does it stop all the process's threads?
If so, do all the other threads remain stopped while I call
PTRACE_SYSCALL
orPTRACE_SINGLESTEP
, or do all threads continue? Is there a way to step forward only in one single thread but guarantee that the other threads remain stopped?
Basically, I want to synchronise the original program by forcing all threads to stop, and then only execute a small set of single-threaded instructions by single-stepping the one traced thread.
My personal attempts so far look a bit like this:
pid_t target = syscall(SYS_gettid); // get the calling thread's ID
pid_t pid = fork();
if (pid > 0)
{
waitpid(pid, NULL, 0); // synchronise main process
important_instruction();
}
else if (pid == 0)
{
ptrace(target, PTRACE_ATTACH, NULL, NULL); // does this work?
// cancel parent's "waitpid" call, e.g. with a signal
// single-step to execute "important_instruction()" above
ptrace(target, PTRACE_DETACH, NULL, NULL); // parent's threads resume?
_Exit(0);
}
However, I'm not sure, and can't find suitable references, that this is concurrently-correct and that important_instruction()
is guaranteed to be executed only when all other threads are stopped. I also understand that there may be race conditions when the parent receives signals from elsewhere, and I heard that I should use PTRACE_SEIZE
instead, but that doesn't seem to exist everywhere.
Any clarification or references would be greatly appreciated!
I wrote a second test case. I had to add a separate answer, since it was too long to fit into the first one with example output included.
First, here is
tracer.c
:tracer.c
executes the specified command, waiting for the command to receive aSIGSTOP
signal. (tracer.c
does not send it itself; you can either have the tracee stop itself, or send the signal externally.)When the command has stopped,
tracer.c
attaches a ptrace to every thread, and single-steps one of the threads a fixed number of steps (SINGLESTEPS
compile-time constant), showing the pertinent register state for each thread.After that, it detaches from the command, and sends it a
SIGCONT
signal to let it continue its operation normally.Here is a simple test program,
worker.c
, I used for testing:Compile both using e.g.
and run either in a separate terminal, or on the background, using e.g.
The tracer shows the PID of the worker:
At this point, the child is running normally. The action starts when you send a
SIGSTOP
to the child. The tracer detects it, does the desired tracing, then detaches and lets the child continue normally:You can repeat the above as many times as you wish. Note that I picked the
SIGSTOP
signal as the trigger, because this waytracer.c
is also useful as a basis for generating complex multithreaded core dumps per request (as the multithreaded process can simply trigger it by sending itself aSIGSTOP
).The disassembly of the
worker()
function the threads are all spinning in the above example:Now, this test program does only show how to stop a process, attach to all of its threads, single-step one of the threads a desired number of instructions, then letting all the threads continue normally; it does not yet prove that the same applies for letting specific threads continue normally (via
PTRACE_CONT
). However, the detail I describe below indicates, to me, that the same approach should work fine forPTRACE_CONT
.The main problem or surprise I encountered while writing the above test programs was the necessity of the
loop, especially for the
ESRCH
case (the others I only added due to the ptrace man page description).You see, most ptrace commands are only allowed when the task is stopped. However, the task is not stopped when it is still completing e.g. a single-step command. Thus, using the above loop -- perhaps adding a millisecond nanosleep or similar to avoid wasting CPU -- makes sure the previous ptrace command has completed (and thus the task stopped) before we try to supply the new one.
Kerrek SB, I do believe at least some of the troubles you've had with your test programs are due to this issue? To me, personally, it was a kind of a D'oh! moment to realize that of course this is necessary, as ptracing is inherently asynchronous, not synchronous.
(This asynchronicity is also the cause for the
SIGCONT
-PTRACE_CONT
interaction I mentioned above. I do believe with proper handling using the loop shown above, that interaction is no longer a problem -- and is actually quite understandable.)Adding to the comments to this answer:
The Linux kernel uses a set of task state flags in the task_struct structure (see
include/linux/sched.h
for definition) to keep track of the state of each task. The userspace-facing side ofptrace()
is defined inkernel/ptrace.c
.When
PTRACE_SINGLESTEP
orPTRACE_CONT
is called,kernel/ptrace.c
:ptrace_continue()
handles most of the details. It finishes by callingwake_up_state(child, __TASK_TRACED)
(kernel/sched/core.c::try_to_wake_up(child, __TASK_TRACED, 0)
).When a process is stopped via
SIGSTOP
signal, all tasks will be stopped, and end up in the "stopped, not traced" state.Attaching to every task (via PTRACE_ATTACH or PTRACE_SEIZE, see
kernel/ptrace.c
:ptrace_attach()
) modifies the task state. However, ptrace state bits (seeinclude/linux/ptrace.h:PT_
constants) are separate from the task runnable state bits (seeinclude/linux/sched.h:TASK_
constants).After attaching to the tasks, and sending the process a
SIGCONT
signal, the stopped state is not immediately modified (I believe), since the task is also being traced. Doing PTRACE_SINGLESTEP or PTRACE_CONT ends up inkernel/sched/core.c::try_to_wake_up(child, __TASK_TRACED, 0)
, which updates the task state, and moves the task to the run queue.Now, the complicated part that I haven't yet found the code path, is how the task state gets updated in the kernel when the task is next scheduled. My tests indicate that with single-stepping (which is yet another task state flag), only the task state gets updated, with the single-step flag cleared. It seems that PTRACE_CONT is not as reliable; I believe it is because the single-step flag "forces" that task state change. Perhaps there is a "race condition" wrt. the continue signal delivery and state change?
(Further edit: the kernel developers definitely expect
wait()
to be called, see for example this thread.)In other words, after noticing that the process has stopped (note that you can use
/proc/PID/stat
or/proc/PID/status
if the process is not a child, and not yet attached to), I believe the following procedure is the most robust one:After the above, all tasks should be attached and in the expected state, so that e.g. PTRACE_CONT works without further tricks.
If the behaviour changes in future kernels -- I do believe the interaction between the STOP/CONT signals and ptracing is something that might change; at least a question to the LKML developers about this behaviour would be warranted! --, the above procedure will still work robustly. (Erring on the side of caution, by using a loop to PTRACE_SINGLESTEP a few times, might also be a good idea.)
The difference to PTRACE_CONT is that if the behaviour changes in the future, the initial PTRACE_CONT might actually continue the process, causing the
ptrace()
that follow it to fail. With PTRACE_SINGLESTEP, the process will stop, allowing furtherptrace()
calls to succeed.Questions?