In the book "Unix network programming, Volume 1" by Richard Stevens, in the section "Difference between wait vs waitpid", it says waitpid() should be used intead of wait(). I understand the problem described when using wait(). It says, when multiple child processes terminate simultaneously and hence multiple SIGCHLDs are raised, the parent may get delivered only the first of them and the others would be lost since the kernel does not queue signals. Ok, but how does waitpid avoid this problem ?
Below is how the book uses waitpid() in the signal handler:
while ( (pid = waitpid(-1, &stat, WNOHANG) ) > 0) {
printf("child %d terminated\n", pid);
}
The difficulty is that a signal
SIGCHLDonly tells that at least one child process has exited or changed its state. You don't know how manywaitorwaitpidcalls are required.According to the documentation, e.g. https://linux.die.net/man/2/waitpid or https://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html, a call
is equivalent to
Your example
uses the additional flag
WNOHANG, which makes the call non-blocking. This means you can repeatedly callwaitpidin a loop until it tells you that it has not found any more process. So you can wait for as many processes as have exited now without knowing their number. After exiting from the loop, the parent process can continue its normal processing.In contrast to this,
waitwould block if there is still a running child process that has not exited or changed its state yet. This would happen when you callwaitin a similar loop. There is no option to makewaitnon-blocking in this case. (You could interrupt it by a signal, though.)So
waitpiddoes not avoid the problem but allows you to handle it without blocking your parent process. It depends on your program if the non-blockingwaitpidis useful or required, or if a possibly blockingwaitis sufficient.