How waitpid is better than wait when multiple SIGCHLDs are raised simultaneously?

549 Views Asked by At

In the book "Unix network programming, Volume 1" by Richard Stevens, in the section "Difference between wait vs waitpid", it says waitpid() should be used intead of wait(). I understand the problem described when using wait(). It says, when multiple child processes terminate simultaneously and hence multiple SIGCHLDs are raised, the parent may get delivered only the first of them and the others would be lost since the kernel does not queue signals. Ok, but how does waitpid avoid this problem ?

Below is how the book uses waitpid() in the signal handler:

    while ( (pid = waitpid(-1, &stat, WNOHANG) ) > 0) {
        printf("child %d terminated\n", pid);
    }
1

There are 1 best solutions below

2
On

The difficulty is that a signal SIGCHLD only tells that at least one child process has exited or changed its state. You don't know how many wait or waitpid calls are required.

According to the documentation, e.g. https://linux.die.net/man/2/waitpid or https://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html, a call

pid_t pid = wait(&status);

is equivalent to

pid_t pid = waitpid(-1, &status, 0);

Your example

while ( (pid = waitpid(-1, &stat, WNOHANG) ) > 0) {
    printf("child %d terminated\n", pid);
}

uses the additional flag WNOHANG, which makes the call non-blocking. This means you can repeatedly call waitpid in a loop until it tells you that it has not found any more process. So you can wait for as many processes as have exited now without knowing their number. After exiting from the loop, the parent process can continue its normal processing.

In contrast to this, wait would block if there is still a running child process that has not exited or changed its state yet. This would happen when you call wait in a similar loop. There is no option to make wait non-blocking in this case. (You could interrupt it by a signal, though.)

So waitpid does not avoid the problem but allows you to handle it without blocking your parent process. It depends on your program if the non-blocking waitpid is useful or required, or if a possibly blocking wait is sufficient.