In the book "Unix network programming, Volume 1" by Richard Stevens, in the section "Difference between wait vs waitpid", it says waitpid() should be used intead of wait(). I understand the problem described when using wait(). It says, when multiple child processes terminate simultaneously and hence multiple SIGCHLDs are raised, the parent may get delivered only the first of them and the others would be lost since the kernel does not queue signals. Ok, but how does waitpid avoid this problem ?
Below is how the book uses waitpid() in the signal handler:
while ( (pid = waitpid(-1, &stat, WNOHANG) ) > 0) {
printf("child %d terminated\n", pid);
}
The difficulty is that a signal
SIGCHLD
only tells that at least one child process has exited or changed its state. You don't know how manywait
orwaitpid
calls are required.According to the documentation, e.g. https://linux.die.net/man/2/waitpid or https://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html, a call
is equivalent to
Your example
uses the additional flag
WNOHANG
, which makes the call non-blocking. This means you can repeatedly callwaitpid
in a loop until it tells you that it has not found any more process. So you can wait for as many processes as have exited now without knowing their number. After exiting from the loop, the parent process can continue its normal processing.In contrast to this,
wait
would block if there is still a running child process that has not exited or changed its state yet. This would happen when you callwait
in a similar loop. There is no option to makewait
non-blocking in this case. (You could interrupt it by a signal, though.)So
waitpid
does not avoid the problem but allows you to handle it without blocking your parent process. It depends on your program if the non-blockingwaitpid
is useful or required, or if a possibly blockingwait
is sufficient.