Confused understanding a passage about char and int types from K&R's "The C Programming Language"

552 Views Asked by At

Concerning this passage from Chapter 1: A Tutorial Introduction in Kernighan and Ritchie: The C Programming Language (I've bolded the specific part that I need clarification on and have elaborated down below):

Given getchar and putchar, you can write a surprising amount of useful code without knowing anything more about input and output. The simplest example is a program that copies its input to its output one character at a time: read a character while (character is not end-of-file indicator) output the character just read read a character Converting this into C gives:

#include <stdio.h>
/* copy input to output; 1st version */ main()
{
       int c;
       c = getchar();
       while (c != EOF) {
           putchar(c);
           c = getchar();
       }
}

The relational operator != means "not equal to". What appears to be a character on the keyboard or screen is of course, like everything else, stored internally just as a bit pattern. The type char is specifically meant for storing such character data, but any integer type can be used. We used int for a subtle but important reason.

The problem is distinguishing the end of input from valid data. The solution is that getchar returns a distinctive value when there is no more input, a value that cannot be confused with any real character. This value is called EOF, for ``end of file''. We must declare c to be a type big enough to hold any value that getchar returns. We can't use char since c must be big enough to hold EOF in addition to any possible char. Therefore we use int.

My understanding is that Char is a type of Int, but it is just smaller (in the same way that Int16, Int32, Int64 in other languages are the same but can represent magnitudes of numbers).

I get that every character can be represented by an integer of type Char, so why can't the EOF value be represented as a Char? Is it because every single integer in the Char type is already accounted for, and even one more number is too large for the data type?

Any explanation or corrections to my knowledge would be appreciated.

8

There are 8 best solutions below

6
On

Is it because every single integer in the Char type is already accounted for, and even one more number is too large for the data type?

Yes, that's exactly correct. To be a little more specific, the whole idea is to define EOF as a value that can be distinguished from any value that getchar could possibly have retrieved from the file. Since you can write any possible value of char to the file, you can also read any possible value of char back from the file. For EOF to do its job correctly, it must be something different from any of those values that could have been written to/read from the file. To do that, it must be a value that can't fit in a char.

0
On

EOF really means absence of character, and it can thus not be a plain character. While there is the option of singling out a value from the range of char to mark this particular value, having an out-of-range value allows managing 256 uniquely valid characters in any platform in which char is 8 bits. To be able to hold an out of range value, the function must use an integer type that can represent all the values in char and at least one more.

0
On

The C standard does guarantee that the return value from getchar() is either a valid character or a distinct code. EOF, is not the code for a valid character. EOF expands to an integer constant expression, with type int and might have negative value.

3
On

The issue is that the "C" standard does not specify if the signed-ness of "char". So, while a modern implementation will likely provide "signed" and "unsigned" char; the early standards actually changed (at least twice). The standard also does specify (since 1989), that whatever value EOF has; it's negative.

0
On

The char type can be either signed or unsigned, depending on the implementation, but EOF is commonly defined as -1. If char is unsigned it can't represent a value of -1, and thus getchar() is defined to return an int, which when unqualified is always signed and can thus represent all possible values of char and represent -1 (EOF).

Share and enjoy.

0
On

My understanding is that Char is a type of Int, but it is just smaller

Yep.

I get that every character can be represented by an integer of type Char, so why can't the EOF value be represented as a Char? Is it because every single integer in the Char type is already accounted for, and even one more number is too large for the data type?

Yep.

0
On

If you look at the man page for getchar, you can read,

getchar() is equivalent to getc(stdin)

getc() is equivalent to fgetc() except that it may be implemented as a macro which evaluates stream more than once.

fgetc() reads the next character from stream and returns it as an unsigned char cast to an int, or EOF on end of file or error.

The synopsis decalares

SYNOPSIS
   #include <stdio.h>

   int fgetc(FILE *stream);

Therefore c should be declared as an int.

0
On
  1. By having EOF be outside the range of possible characters, the example code will successfully copy any ("binary") data. There is no chance of EOF being a a valid value in the middle of the data.

  2. The best C language book is by Harbison and Steele, C: A Reference Manual. And I've used them all.