Why do I need to explicitly putchar (' ') in this code?

92 Views Asked by At

I am working through Kernighan & Ritchie and have got to exercise 1.9. In fact I wrote some code which appears to solve the exercise, and I have tested it on Windows (with Git Bash and gcc) and Termux (with clang) by piping in a line with a variable number of spaces e.g. echo " one two three", and the expected output comes out i.e. one two three.

I got to the solution by trial and error, though it is the same as the solution provided by Lvictor on the clc wiki.

The code I wrote myself is:

#include <stdio.h>
    
/* Write a program to copy its input to its output, replacing each string of one or more blanks by a single blank. */
    
int main () {
    int c;
    
    while ((c = getchar()) != EOF) {
        if (c == ' ') {
            while ((c = getchar()) == ' ' ) {
            }
            putchar (' ');
        }
        putchar (c);
    }
}

What I find perplexing is why I need the putchar (' '); line. I do not understand why the value of c is not ' ' after the program exits the if statement. Originally I did not have this line, but then the program removed all spaces from the input, to my surprise.

As I am a beginner C programmer, maybe there is something I don't understand about the scope of variable values, though it seems to me that if the value of c is ' ' when the if statement starts, and if it is ' ' in the second while loop, it should also be ' ' when both exit, but this does not appear to be the case.

I have Googled K&R exercise 1.9, which is how I found the wiki above, and looked at other quesions concerning this exercise on Stack Overflow.

2

There are 2 best solutions below

7
John Bollinger On

I do not understand why the value of c is not ' ' after the program exits the if statement

The program enters the body of the if block only in the event that c == ' ', but inside that block, the inner while loop reads new values for c, one at a time, until c is set to a value different from ' '. Therefore, when control leaves that inner while loop, c is guaranteed to not contain a ' '.

You argue,

if the value of c is ' ' when the if statement starts, and if it is ' ' in the second while loop, it should also be ' ' when both exit, but this does not appear to be the case.

The problem with that logic is the "if it is ' ' in the second while loop". That's at best imprecise. What matters for this purpose is not the value of c in the body of the loop (especially for your empty loop), but rather the value of c after the last time the loop's condition is evaluated -- which will be the first time that it evaluates to false. c is updated every time that expression is evaluated. This equivalent form for that loop would be clearer:

            do {
                c = getchar();
            } while (c == ' ');
4
Fe2O3 On

The two instances of getchar() make it difficult to handle EOF correctly.

The following is an alternative that may be easier to understand:

#include <stdio.h>

int main( void ) {
    int c;
    int SPfound = 0;

    while( ( c = getchar() ) != EOF ) // Note single "entry point" for data
        if( c == ' ' ) // swallow and flag one or more contiguous SP's
            SPfound = 1;
        else {
            if( SPfound ) // if any SP's, output a single one.
                putchar( ' ' );
            putchar( c ); // output the non-SP character.
            SPfound = 0;
        }
}

Notice that this would "trim" any trailing spaces fetched prior to EOF. You can consider if you want the program to add a single trailing SP to represent those.

EDIT:
Not happy to leave it, here's another version that uses less code.

#include <stdio.h>

int main( void ) {
    for( int c, prev = 0; ( c = getchar() ) != EOF; prev = c )
        if( !( c == ' ' && prev == ' ' ) ) // output when NOT consecutive SP's
            putchar( c );
}

This outputs the first of any series of SP's encountered, but suppresses the 2nd, 3rd, ...