Getline stops working after sigchld is received

375 Views Asked by At

I have been experimenting with signals and I am facing a problem I can not explain.

I have recreated my issue in this simple C program, in a nutshell I am reading user input in a loop using getline(). The user can fork the process, kill the child process, or exit the main process all together.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>

int counter = 0;

void handler(int signum){
    counter++;
}

int main(){
    int bool = 1;
    char *input;
    size_t  size=100;
    input = malloc(sizeof(char)*100);
    memset(input,'\0',size);
    pid_t id;

    struct sigaction sa;

    do{
        printf("counter=%d\n",counter);
        getline(&input,&size,stdin);
        if( strncmp(input,"fork",4) == 0 ){

            id = fork();
            if( id == 0 ){//child
                while(1) sleep(1);
                free(input);
                return 0;
            }else if( id > 0 ){//parent
                sa.sa_handler = handler;
                sigaction(SIGCHLD, &sa, NULL);
            }else{//fork failed
                free(input); return -1;
            }

        }else if( strncmp(input,"kill",4) == 0 ){
            kill(id,9);
        }else if( strncmp(input,"exit",4) == 0 ){ 
            bool = 0;
        }
        
    }while(bool == 1);

    free(input);
    return 0;
}

The strange thing is that if I fork a child process and then kill it, in other words typing to the stdin:

fork

kill

I get stuck in an infinite loop where the following is printed to the stdout indefinitely (which is also an idication that the SIGCHLD was cached when the child was killed)

counter 1

If I remove the signal handler everything seems to be working fine. I know that getline() uses the read() syscall and the SIGCHLD signal causes it's interruption, but apart from that I am almost certain that in the next iteration the getline() function should work just fine. Does anyone have an explanation why getline() stops working?

(I am using the gcc compiler and executing the program on Ubuntu 20.04 LTS)

2

There are 2 best solutions below

1
On

On onlinegdb.com I could not always reproduce the problem. Sometimes it seems to work as expected, sometimes I get repeated errors reported by getline.

By setting errno = 0 before calling getline and checking both the return value of getline and errno afterwards, I found out that getline repeatedly returns -1. On the first call it sets errno = EINTR (perror reports "Interrupted system call") on the subsequent calls, errno remains 0 ("Success").

    /* ... */
    do{
        printf("counter=%d\n",counter);
        errno = 0;
        if(getline(&input,&size,stdin) < 0)
        {
            static int i = 20; // to avoid endless loop
            perror("getline");
            if(--i == 0) return 1;
        }
    /* ... */

Apparently, in some/many cases the signal sets a permanent error condition of the input stream stdin.

The permanent error can be cleared by calling clearrerr.

Unfortunately I did not (yet) find a documentation that explains this behavior.

    /* ... */
    do{
        printf("counter=%d\n",counter);
        errno = 0;
        if(getline(&input,&size,stdin) < 0)
        {
            perror("getline");
            if(errno == EINTR)
            {
                //clearerr(stdin); // clearing here would avoid the 2nd error return
            }
            else if(errno == 0)
            {
                clearerr(stdin);
            }
            else
            {
                return 2;
            }
        }
    /* ... */
3
On

The reason is when read() syscall is interrupted (when the parent process receives SIGCHLD, read() fails with EINTR), the stream is set to error state. This is as documented in POSIX's getline:

If an error occurs, the error indicator for the stream shall be set, and the function shall return -1 and set errno to indicate the error.

If the signal was delivered to the parent before entering the read() system call, then it would be handled before system call and thus there's no EINTR on read(). That's why you may not always see the infinite loop on getline() call.

but apart from that I am almost certain that in the next iteration the getline() function should work just fine.

Once a stream is set to error, it's not automatically cleared next time. So you have to clear it yourself with clearerr.

Note that this behaviour happens because of the requirement of getline; doesn't come from the interrupted system call read(). If you were to use read() directly on file descriptor STDIN_FILENO in a loop, it'll work as expected in the next iteration as you expected i.e. no infinite loop.

Alternatively, you could tell system calls to be restarted automatically with SA_RESTART flag:

sa.sa_flags = SA_RESTART;

In that case, EINTR is transparently handled and read() is restarted automatically after handling the signal and is never conveyed to getline() function.


P.S.: you should initialize sa with:

struct sigaction sa = {0};

and empty initialise the signal set with sigemptyset:

sigemptyset(&sa.sa_mask);

because you're only setting the sa_handler and rest of the fields are left uninitialized!