Behavior of scanf when first format string character is not whitespace

126 Views Asked by At

scanf is supposed to consume and discard any character in the format string that is not part of a conversion specifier. But its behavior seems different when a non-whitespace, non-conversion character comes first in the format string. Why?

int main() {
    int num, num2;
    printf("> ");
    while (scanf("> %d %d", &num, &num2)) {
        printf("You entered the number %d %d.\n", num, num2);
        printf("> ");
    }
    return EXIT_SUCCESS;
}

If you build and run this and enter

> 3 4

at the prompt, it prints the message and the repeated prompt and then quits immediately.

So that means that scanf returns 2 the first time and then returns 0 before the user can enter another set of tokens. If you remove the > from the format string, the loop will run until the user enters something not a numeral, which then causes scanf to return 0 - the behavior I would expect.

Also, if I put that same symbol after the first conversion specifier, the loop continues to run as expected. That is, if the format string has, say, "%d > %d", and the user enters

3 > 4

the loop will run again and accept another round of input.

I have not seen any documentation on this behavior.

2

There are 2 best solutions below

5
On

From some documentation on fscanf:

The format string consists of

  • non-whitespace multibyte characters except %: each such character in the format string consumes exactly one identical character from the input stream, or causes the function to fail if the next character on the stream does not compare equal.

While the fscanf specifier %d consumes any and all1 leading whitespace, it does not consume the line feed that follows it, and '>' does not exactly match that newline character ('\n') on subsequent iterations.

From the same documentation:

  • whitespace characters: any single whitespace character in the format string consumes all available consecutive whitespace characters from the input (determined as if by calling isspace in a loop). Note that there is no difference between "\n", " ", "\t\t", or other whitespace in the format string.

So a leading whitespace in your format specifier will consume the trailing newline character from the input:

#include <stdio.h>

int main(void)
{
    int num, num2;

    while (1) {
        printf("Enter \"> NUM NUM2\": ");

        if (2 != scanf(" > %d %d", &num, &num2))
            break;

        printf("You entered the number %d %d.\n", num, num2);
    }
}
Enter "> NUM NUM2": > 1 2
You entered the number 1 2.
Enter "> NUM NUM2": > 3 4
You entered the number 3 4.

Aside: this will loop forever on the truthy return value of EOF (a negative int, almost universally -1):

while (scanf("> %d %d", &num, &num2))

You should explicitly check the return value of scanf is the expected number of conversion specifiers (i.e., 2).


1. As do all format specifiers except %c, %[, and %n (assuming no errors occur).

0
On

I generally agree with all the comments, especially the ones which emphasize the fact that scanf is not really a good idea for user input.

The answer to your question is also mostly given, a trailing newline 'character' is added to the input stream by Enter in the first input, it's not parsed by scanf and will remain there, in the second loop that newline will be matched with the first character in the specifier, > in this case, and it will fail to match, having scanf return 0, breaking the loop.

As suggested, adding a leading whitespace to the specifier will force scanf to consume said newline and clear the input buffer which will again wait for input, exposing this behavior.