tee >(cat -n) < tmpfile prints tmpfile completely before repeating

62 Views Asked by At

tmpfile contains the following:

a
b
c
d

The output of the command in the question's title is the following:

[me@localhost somedir]$ tee >(cat -n) < tmpfile    
a
b
c
d
[me@localhost somedir]$      1  a
     2  b
     3  c
     4  d

Since tee and cat are connected via a named-pipe, I was expecting cat to finish sending output to the terminal before tee prints the next line. Something like this:

[me@localhost somedir]$ tee >(cat -n) < tmpfile    
a
1  a
b
2  b
c
3  c
d
4  d
[me@localhost somedir]$     

Can someone please explain what's happening here? I considered the possibility of a race condition, where tee is just winning, but this happens with files of size equal to a few KBs as well. I feel there is something more here.

Thanks.

1

There are 1 best solutions below

1
On

If you want this to be won by the other side, you can do that easily (assuming that we're using the same implementations of tee, as the specific ordering is implementation-defined rather than standardized):

# note that this uses automatic FD allocation support added in bash 4.1
( exec {orig_stdout}>&1; { tee >(cat >&$orig_stdout) | cat -n; } <<<$'a\nb\nc' )

In short: tee (as implemented by GNU coreutils 8.2.2) writes each chunk -- not each line; the POSIX spec for tee explicitly prohibits line-oriented output buffering -- first to its stdout, then to each argument in turn, left-to-right.

You can see that in the implementation:

/* Move all the names 'up' one in the argv array to make room for
   the entry for standard output.  This writes into argv[argc].  */
for (i = nfiles; i >= 1; i--)
  files[i] = files[i - 1];

...then building a descriptors array mapping 1:1 with the array entries in files, and writing to each in turn:

/* Write to all NFILES + 1 descriptors.
   Standard output is the first one.  */
for (i = 0; i <= nfiles; i++)
  if (descriptors[i]
      && fwrite (buffer, bytes_read, 1, descriptors[i]) != 1)

To explain why this would be implemented in such a way to be consistent behavior rather than a race -- the POSIX specification for tee requires that it not buffer input. Consequently, ordering is necessarily maintained between writes to each descriptor (though of course, ordering can be lost after that point, should items later in any pipeline do buffering themselves).


Now: This is not to say that tee copies the complete input to each location before proceeding to the next. Rather, tee works in blocks of BUFSIZ bytes each, where BUFSIZ is an operating-system-specific constant guaranteed to be no less than 256 bytes, and on modern (non-embedded) Linux frequently in the neighborhood of 8K. Thus, if you use significantly larger inputs, you'll see interleaving, as you would expect... but in a consistent order, for the reasons given above.