How do programs in unix know whether the input stream "is over"?

95 Views Asked by At

Maybe I'm not using the right terminology in my question, therefore I'll elaborate here with a clear example.

You can send the output of a command to the command line utility fzf, and if the former is still running, fzf lets you know it showing a "loading icon" and being refreshed live. For a clear example:

for i in `seq 1 5`; do echo $i; sleep 1 ; done | fzf

Now, I'd like to understand how fzf knows that the stream is not yet finished. And related but going beyond, I'd like to be able to do the same but passing through a file.

for i in `seq 1 5`; do echo $i; sleep 1 ; done > file.txt  &

How can I take the content of file.txt and send it live to fzf? I have tried fzf < file.txt but it sends a static version of the file, as it is in the moment of writting the command. It does not let it live.

3

There are 3 best solutions below

3
KamilCuk On

How do programs in unix know whether the input stream "is over"?

When the read system calls returns 0. See man 3 read.

how fzf knows that the stream is not yet finished

fzf calls the read system call and stays inside the system call until there is something to return. When there is something to read, the execution of the program continues.

Interactive behavior is achieved by spawning threads (in go). One thread waits on the read system call. The other displays your "icon" or handles user input from /dev/tty.

How can I take the content of file.txt and send it live to fzf?

Typically, tail -f is used to "follow" the end of a file while outputting the content.

0
ilkkachu On
for i in `seq 1 5`; do echo $i; sleep 1 ; done | fzf

Now, I'd like to understand how fzf knows that the stream is not yet finished.

The custom is that "end of file" indicated by a read() system call that returns with zero bytes read. For regular files, this happens when the reader reads at a position that's at or past the end of the data, there's no data after the end, but there's also no error, so the call returns with zero bytes read. Most programs stop reading then and there, since trying again would likely just result in an endless repetition of zero-byte reads.

In a pipeline like the for ... | fzf, the reading side of the pipe is kept open as long as the write end is open in some process. The man page pipe(7) mentions this:

If all file descriptors referring to the write end of a pipe have been closed, then an attempt to read(2) from the pipe will see end-of-file (read(2) will return 0).

In your case, when the shell process corresponding to the left-hand side of the loop finishes and exits, the right-hand side process sees the end-of-file. Until then, it only blocks, waiting for more data.

I have tried fzf < file.txt but it sends a static version of the file, as it is in the moment of writting the command. It does not let it live.

The redirection doesn't "send" anything. It gives fzf a file descriptor that's open for reading the file. fzf itself can then decide if to read it all immediately, or to wait for something in between, or whatever. Most programs read the input more or less immediately, or at least as fast as they can given the processing they need to do for the data and then stop when there's no more data to read. While it's possible for something new to be written to the file, there's no way indicate if and when that would happen, so waiting would be counterproductive.

(In the case of reading from a terminal or a datagram socket, a read of zero bytes might not be an actual end, and even on a regular file, more data could be written to the file later, making a further read return something else.)

Note that if there is a concurrent writer to the file, fzf can well see writes that we made after the file was opened. Of course, that's not that easy to encounter unless the file is large or the reader is slow.


If you want to wait for more data to appear in the file, you'll have to persuade your tool to retry reading at some sensible intervals, or use some tool made for the purpose, like tail -f.

4
Peter - Reinstate Monica On

Shells open a pipe between the two processes linked by a pipe symbol |. The shells (sh, ksz, zsh, bash) differ a bit in how exactly they implement that (for example: is one process the executing shell itself, and if yes, which one?); but the important thing to realize is that it is two processes communicating through a one-directional pipe. The writing process writes to standard out, which by default is file descriptor 1, and the other one reads from standard in, which by default has file descriptor 0. The operating system buffers data written by one and provides it to the reading one on the other end.

In the end, all higher-level language specific I/O routines (for example printf or fread from the C standard library) call the low-level I/O system calls. Low-level I/O is one of the functionalities that an operating system provides. The POSIX specific user-land system call read() from unistd.h which you call from your program is a wrapper around a kernel system call. The kernel has knowledge about the actual transport mechanism underlying a file descriptor (a pipe, a file, a socket) and about that specific facility's state. It communicates with the hardware through device drivers which eventually read from actual electric lines, and report back up the chain of abstraction, until it reaches user accessible information like the read()call and errno. Higher-level functions like fread() evaluate this information and transform it to their own specification (return EOF, return true for subsequent feof()).

As Kamil pointed out, the specification of read() specifies that the end-of-file condition is reported by returning 0 even though more than 0 bytes were requested. If end of file was, by contrast, not reached but instead the communication was just stalled for some reason — a seeking disk, a spotty WLAN —, the default read() call would block until data was available. If it was called non-blocking, it would return with -1 and set errno to EAGAIN.

In your case, the condition for end-of-file for the pipeline between the bash loop and fzf in

for i in $(seq 1 5); do echo $i; sleep 1 ; done | fzf

is that the left side closes the writing end of the pipeline. This is reported to the reading side by the kernel, and a pending user-land read() will return empty, possibly after coming back with fewer bytes than requested first (but a short read alone is not a sign of end-of-file!).

When you redirect input to come from a file, things are different. Reaching the end of the file at a given moment, as in your example, is literally end-of-file and communicated as such. A program like tail -f which wants to monitor changes to a file may need to monitor the file state or employ operating system specific mechanisms exceeding the smallest common denominator of the POSIX specification, for example inotify.