Why does wc output different padding spaces depending on how stdin is connected?

166 Views Asked by cdjc At 05 September 2023 at 00:28

See the following two commands with output:

$ wc < myfile.txt
 4  4 34
$ cat myfile.txt | wc
      4       4      34

My understanding is that these two both connect the stdin of the wc process with the content stream of myfile.txt. But why is the output padded in one case, and not in the other? How does wc tell the difference between the two? Is it not just reading from stdin?

Original Q&A

There are 1 best solutions below

Gordon Davisson On 05 September 2023 at 02:57 BEST ANSWER

Short answer: because with wc < myfile.txt, the wc program has direct access to the file, and can do things besides reading from it. Specifically, it can get the file's size (and it bases the output column width on that). With cat myfile.txt | wc, it can't do that, so it uses wide columns to make sure there's enough room.

Long answer: wc tries to provide nicely columnated output:

$ wc a.txt b.txt 
   6    6   88 a.txt
  60  236 1772 b.txt
  66  242 1860 total

In order to estimate how wide its columns need to be, the GNU version of wc runs stat() (or fstat()) on all of its input files (before actually reading them to get the detailed counts), and uses their sizes to determine how large the word/line/character counts might get, and hence how wide it might need to make the columns to have room for all those digits.

If it can't get any of the input files' sizes (e.g. because they're not plain files, but pipes or something similar), it "assumes the worst", and forces a minimum width of 7 digits. So anytime any of the inputs are pipes or anything like that, you're going to get at-least-7-character-wide columns.

Some examples:

# direct input via stdin
$ wc a.txt - <b.txt
   6    6   88 a.txt
  60  236 1772 -
  66  242 1860 total

# indirect input via cat and a pipe on stdin
$ cat b.txt | wc a.txt -
      6       6      88 a.txt
     60     236    1772 -
     66     242    1860 total

# direct via file descriptor #4
$ wc a.txt /dev/fd/4 4<b.txt
   6    6   88 a.txt
  60  236 1772 /dev/fd/4
  66  242 1860 total

# indirect input via cat and a pipe on FD #63
$ wc a.txt <(cat b.txt)
      6       6      88 a.txt
     60     236    1772 /dev/fd/63
     66     242    1860 total

Why does wc output different padding spaces depending on how stdin is connected?

There are 1 best solutions below

Related Questions in BASH

Related Questions in PIPE

Related Questions in STDIN

Related Questions in WC

Trending Questions

Popular # Hahtags

Popular Questions