Which type of buffering is used by the two file descriptors returned by `pipe`?

165 Views Asked by At

As per the manual of man 2 pipe, which says that[emphasis mine]:

pipe() creates a pipe, a unidirectional data channel that can be used for interprocess communication. The array pipefd is used to return two file descriptors referring to the ends of the pipe. pipefd[0] refers to the read end of the pipe. pipefd[1] refers to the write end of the pipe. Data written to the write end of the pipe is buffered by the kernel until it is read from the read end of the pipe. For further details, see pipe(7)

But the quotation above and man 7 pipe do not mention which type of buffering is used by the two file descriptors returned by pipe?

And as per the document, which says there are three types of buffering[emphasis mine]:

Standard I/O Library Buffering The stdio library buffers data with the goal of minimizing the number of calls to the read() and write() system calls. There are three different types of buffering used:

Fully (block) buffered. As characters are written to the stream, they are buffered up to the point where the buffer is full. At this stage, the data is written to the file referenced by the stream. Similarly, reads will result in a whole buffer of data being read if possible.

Line buffered. As characters are written to a stream, they are buffered up until the point where a newline character is written. At this point the line of data including the newline character is written to the file referenced by the stream. Similarly for reading, characters are read up to the point where a newline character is found.

Unbuffered. When an output stream is unbuffered, any data that is written to the stream is immediately written to the file to which the stream is associated.

The ANSI C standard dictates that standard input and output should be fully buffered while standard error should be unbuffered. Typically, standard input and output are set so that they are line buffered for terminal devices and fully buffered otherwise.

And I did a simple test on Ubuntu16.04, here is the code snippet:

#include <sys/wait.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <string>
#include <thread>
#include <array>
#include <iostream>

int
main(int argc, char *argv[])
{
    int pipefd[2];
    pid_t cpid;
    std::array<char, 1024> buf;
    
    if (pipe(pipefd) == -1) {
        perror("pipe");
        exit(EXIT_FAILURE);
    }
    cpid = fork();
    if (cpid == -1) {
        perror("fork");
        exit(EXIT_FAILURE);
    }
    if (cpid == 0) {    /* Child reads from pipe */
        close(pipefd[1]);          /* Close unused write end */
        int size;
        while ((size = read(pipefd[0], buf.data(), buf.size())) > 0)
             std::cout << size << std::endl;
        write(STDOUT_FILENO, "\n", 1);
        close(pipefd[0]);
        _exit(EXIT_SUCCESS);
    } else {            /* Parent writes argv[1] to pipe */
        close(pipefd[0]);          /* Close unused read end */
        std::string str{"hello world"};
        for(int i=0; i<3; i++)
        {
            write(pipefd[1], str.c_str(), str.size());
            std::this_thread::sleep_for(std::chrono::seconds(3));
        }
        close(pipefd[1]);          /* Reader will see EOF */
        wait(NULL);                /* Wait for child */
        exit(EXIT_SUCCESS);
    }
}

Here is the output of the aforementioned code snippet:

11
//about three seconds later
11
//about three seconds later

You see that what the write end writes to the pipe does not contain a \n, whereas the read end could read out a full string every three seconds. So I think that the two file descriptors returned by pipe are neither block buffered nor line buffered. Two choices are stripped out, then there is only unbuffered buffering.

But the manual of man 7 pipe also says that:

PIPE_BUF POSIX.1-2001 says that write(2)s of less than PIPE_BUF bytes must be atomic: the output data is written to the pipe as a contiguous sequence. Writes of more than PIPE_BUF bytes may be nonatomic: the kernel may interleave the data with data written by other processes. POSIX.1-2001 requires PIPE_BUF to be at least 512 bytes. (On Linux, PIPE_BUF is 4096 bytes.)

As per the quotation above, there are buffers for the two file descriptors indeed.

So I am really confused which type of buffering is used by the two file descriptors returned by pipe. And since there is buffer provided to each of the file descriptor, how could I receive the string on time(i.e. one string every three seconds other than three strings together)?

Could anybody shed some light on this matter?

1

There are 1 best solutions below

0
Thomas On

There is buffering in two different places, and it's important not to confuse them.

The document you quoted is about "Standard I/O Library Buffering", which refers to the standard C library (libc). It applies to libc functions like fprintf and fwrite. This buffering happens in userspace, i.e. in the process of your program. Under the hood, when this buffer is flushed, libc invokes write to send the data to the underlying file descriptor.

However, pipe is a direct¹ system call to the kernel, which has nothing to do with the buffering in libc. You can tell the difference because it works with file descriptors, not with FILE*s. The read and write functions you're using in your code are also system calls.

The pipe is still buffered in kernel space, but it makes no sense there to talk about "line buffered" or "block buffered" because there is no automatic flushing going on. If the buffer is full, any write call simply blocks until there is space again. The only way that the buffer in the kernel gets drained is through read calls.


¹ Through a thin wrapper which is also in the C library, but that's beside the point.