How to safely `waitpid()` in a plugin with `SIGCHLD` handler calling `wait()` setup in the main program

1.4k Views Asked by At

I am writing a module for a toolkit which need to execute some sub processes and read their output. However, the main program that uses the toolkit may also spawn some sub processes and set up a signal handler for SIGCHLD which calls wait(NULL) to get rid of zombie processes. As a result, if the subprocess I create exit inside my waitpid(), the child process is handled before the signal handler is called and therefore the wait() in the signal handler will wait for the next process to end (which could take for ever). This behavior is described in the man page of waitpid (See grantee 2) since the linux implementation doesn't seem to allow the wait() family to handle SIGCHLD. I have tried popen() and posix_spawn() and both of them have the same problem. I have also tried to use double fork() so that the direct child exist immediately but I still cannot garentee that waitpid() is called after SIGCHLD is recieved.

My question is, if other part of the program sets up a signal handler which calls wait() (maybe it should rather call waidpid but that is not sth I can control), is there a way to safely execute child processes without overwriting the SIGCHLD handler (since it might do sth useful in some programs) or any zombie processes.

A small program which shows the problem is here (Noted that the main program only exit after the long run child exit, instead of the short one which is what it is directly waiting for with waitpid()):

#include <signal.h>
#include <sys/wait.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>

static void
signalHandler(int sig)
{
    printf("%s: %d\n", __func__, sig);
    int status;
    int ret = waitpid(-1, &status, 0);
    printf("%s, ret: %d, status: %d\n", __func__, ret, status);
}

int
main()
{
    struct sigaction sig_act;
    memset(&sig_act, 0, sizeof(sig_act));
    sig_act.sa_handler = signalHandler;
    sigaction(SIGCHLD, &sig_act, NULL);

    if (!fork()) {
        sleep(20);
        printf("%s: long run child %d exit.\n", __func__, getpid());
        _exit(0);
    }

    pid_t pid = fork();
    if (!pid) {
        sleep(4);
        printf("%s: %d exit.\n", __func__, getpid());
        _exit(0);
    }
    printf("%s: %d -> %d\n", __func__, getpid(), pid);

    sleep(1);
    printf("%s, start waiting for %d\n", __func__, pid);
    int status;
    int ret = waitpid(pid, &status, 0);
    printf("%s, ret: %d, pid: %d, status: %d\n", __func__, ret, pid, status);

    return 0;
}
1

There are 1 best solutions below

4
On

If the process is single-threaded, you can block the CHLD signal temporarily (using sigprocmask), fork/waitpid, then unblock again.

Do not forget to unblock the signal in the forked child - although POSIX states the signal mask is undefined when a process starts, most existing programs expect it to be completely unset.